Ridge Regression and Lasso Regression

The general form of linear regression

write picture description here


Overfitting problem and its solution

    • Question: Show the overfitting problem with the following picture
      write picture description here
    • Solution: (1): Discard some features that have little effect on our final prediction results, and which features need to be discarded can be realized by the PCA algorithm; (2): Use regularization technology to retain all features, but reduce the front of the features. The size of the parameter θ, specifically, is to modify the loss function form in linear regression, which is what ridge regression and Lasso regression do.

Ridge Regression and Lasso Regression


The emergence of ridge regression and Lasso regression is to solve the problem of overfitting in linear regression and the irreversibility of x transposition multiplied by x in the process of solving θ by the normal equation method. A regularization term is introduced into the loss function to achieve the purpose. The specific loss function comparison of the three is shown in the figure below:
write picture description here
where λ is called the regularization parameter. If λ is too large, all parameters θ will be minimized, resulting in underfitting. If the selection of λ is too small, it will lead to an improper solution to the overfitting problem, so the selection of λ is a technical activity.
The biggest difference between ridge regression and Lasso regression is that ridge regression introduces the L2 norm penalty term, and Lasso regression introduces the L1 norm penalty term. Lasso regression can make many θ in the loss function become 0, which is better. For ridge regression, because ridge regression requires all theta to exist, the computational cost of Lasso regression will be much smaller than that of ridge regression.
Ridge Return
write picture description here
It can be seen that Lasso regression will eventually tend to a straight line, because many theta values ​​are already 0, while ridge regression has a certain degree of smoothness, because all theta values ​​exist.

Excerpted from: https://blog.csdn.net/hzw19920329/article/details/77200475

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325353042&siteId=291194637