Linear least squares regression SVM kernel

Here Insert Picture Description
Here Insert Picture Description

Here Insert Picture Description
Lasso basic idea is if the absolute value of the regression coefficient of less than a constant constraint the sum of squared residuals is minimized, it is possible to have some strict regression coefficients equal to 0, the resulting model can explain.
Lasso (Least absolute shrinkage and selection operator , Tibshirani (1996)) is a compression method of estimation. It penalty function obtained by constructing a model of a more refined, such that it compresses some coefficients, while the number of coefficients set to zero. Thus retains the advantages of a subset of contraction, a co-linear processing complex data having a biased estimate.

Here Insert Picture Description
https://www.zhihu.com/question/37031188

https://blog.csdn.net/hzw19920329/article/details/77200475

The difference between the linear regression and logistic regression

https://blog.csdn.net/jiaoyangwm/article/details/81139362
Here Insert Picture Description
Here Insert Picture Description

Extremum seeking mathematical thinking, derivative of the formula 0 = extreme value can be obtained, but a large amount of the computing industry, the formula can be complicated, from a computer is concerned, the extremum is gradient descent method.

Here Insert Picture Description

Here Insert Picture Description
Just a convenient mathematical calculations

Here Insert Picture Description

Here Insert Picture Description

Here Insert Picture Description

Explanation: If a sample is positive sample, then we want to predict the probability of a positive sample p bigger the better, the greater the value is, the better the decision-making function, the logp the bigger the better, the decision is a function of the value of logistic regression samples positive probability;

If a sample is negative samples, then we want to predict the probability of a negative sample the bigger the better, that is, the greater the (1-p), the better, that is, log (1-p) the bigger the better.

Here Insert Picture Description

Here Insert Picture Description

Here Insert Picture Description

Difference
1 is actually equivalent to nonlinear regression into linear regression becomes logic
2. The second is the added log

Here Insert Picture Description

First look at the Bernoulli distribution, like the meaning likelihood function: the premise know models and samples obtained sample is positive likelihood probability p (ie likelihood)
https://blog.csdn.net / zhenghaitian / article / details / 80968986

nndl LR is a cross-entropy loss function

Here Insert Picture Description

Here Insert Picture Description

Here Insert Picture Description

SVM

Here Insert Picture Description

Guess you like

Origin blog.csdn.net/qq_32450111/article/details/86628056