The general form of linear regression
Overfitting problem and its solution
- Question: Show the overfitting problem with the following picture
- Solution: (1): Discard some features that have little effect on our final prediction results, and which features need to be discarded can be realized by the PCA algorithm; (2): Use regularization technology to retain all features, but reduce the front of the features. The size of the parameter θ, specifically, is to modify the loss function form in linear regression, which is what ridge regression and Lasso regression do.
Ridge Regression and Lasso Regression
The emergence of ridge regression and Lasso regression is to solve the problem of overfitting in linear regression and the irreversibility of x transposition multiplied by x in the process of solving θ by the normal equation method. A regularization term is introduced into the loss function to achieve the purpose. The specific loss function comparison of the three is shown in the figure below:
where λ is called the regularization parameter. If λ is too large, all parameters θ will be minimized, resulting in underfitting. If the selection of λ is too small, it will lead to an improper solution to the overfitting problem, so the selection of λ is a technical activity.
The biggest difference between ridge regression and Lasso regression is that ridge regression introduces the L2 norm penalty term, and Lasso regression introduces the L1 norm penalty term. Lasso regression can make many θ in the loss function become 0, which is better. For ridge regression, because ridge regression requires all theta to exist, the computational cost of Lasso regression will be much smaller than that of ridge regression.
It can be seen that Lasso regression will eventually tend to a straight line, because many theta values are already 0, while ridge regression has a certain degree of smoothness, because all theta values exist.Excerpted from: https://blog.csdn.net/hzw19920329/article/details/77200475