Linear regression avoid overfitting (a): Regularization Linear Model 3 (Ridge Regression Regression + + Lasso elastic network)

Regularization linear model


Here Insert Picture Description

1.Ridge Regression (ridge regression, also known Tikhonov regularization)

+ L2 regular linear regression

Specific loss function

+ = Loss function objective function regularization term

Ridge regression is a regular version of linear regression, is added regularization (regularization term) in the cost function of the original linear regression in:
Here Insert Picture Description
to achieve at the same time fit the data, the model weights as small as possible purpose, ridge regression cost function :

Here Insert Picture Description
which is
Here Insert Picture Description

  • α = 0: Ridge regression is a linear regression degenerate

2.Lasso return

Linear regression + L1 canonical

Lasso return is another regular version of linear regression, regular ℓ1 entry for the weight vector norm.

The cost Lasso regression function:

Here Insert Picture Description

【note】

  • Lasso Regression cost function at θi = 0 is a non-conductive.
  • Solution: θi = 0 at a time instead of a gradient of the gradient vector (subgradient vector), the following formula
  • Lasso Regression secondary gradient vector
    Here Insert Picture Description

Lasso Regression There is a very important property: tend to completely eliminate unimportant weights.

For example: when the α value is relatively large, the degradation of the secondary even order polynomial is linear: the right order polynomial feature weight is set to zero.

That is, Lasso Regression feature selection can be automatically performed, and outputs a sparse model (only a few of which the right weight is non-zero).

3. resilient network

The first two fused together, ridge regression + Lasso return

Flexible Network Lasso and Ridge Regression Regression was compromise, by ratio (mix ratio) r Mixed control:

  • r = 0: the network becomes resilient ridge regression
  • r = 1: elastic network will return to Lasso

Resilient network cost function:

Here Insert Picture Description

In general, we should avoid the use of simple linear regression , and response to model certain regularization process, that how to choose the regularization methods?

summary:

  • Common: ridge regression

  • Assuming that only a small portion of the feature is useful:

    • Elastic Network
    • Lasso
    • In general, the more extensive use of network resiliency. Because the characteristic dimension greater than the number of training samples, or characterized by strong relevant circumstances, Lasso return performance is not stable.
  • api:

    • from sklearn.linear_model import Ridge, ElasticNet, Lasso
      

4.Early Stopping

Strictly speaking not a regularization

Early Stopping is one of regularization iterative learning method.

Its approach is: to stop training in validation error rate reaches a minimum of time.

Published 536 original articles · won praise 679 · views 80000 +

Guess you like

Origin blog.csdn.net/qq_35456045/article/details/104516760