Good memory machine learning interview - Linear Regression

1. What is the linear regression

  • Linear: the relationship between the two variables is a function of - the image is a straight line , called linear.
  • Linear: the relationship between the two variables is not a function of - the image is not a straight line , called linear.
  • Return: People measure things in time because the objective conditions, it is determined measured values, rather than the real value of things, in order to get the true value, an unlimited number of measurements, the final calculation of these measurements back to the real value , which is the origin regression.

2. What kind of problems can be solved

A large number of observational data processing, resulting in a mathematical expression more in line with the internal law of things. That law between the data and the data to find the location, so you can simulate the results, that is, the results were predictable. The solution is to give unknown results known data. For example: prediction of prices, determine the credit rating, the movie box office estimates so on.

3. What is the general expression

X is called the coefficient w, b is called the bias term.

4. How to calculate

4.1 Loss Function--MSE

Using a gradient descent algorithm to find the minimum point, i.e. the minimum error, finally to find out w and b.

5. over-fitting, underfitting how to solve

Use regularization term, i.e. loss function to add a parameter entry, there is a regularization term L1 regularization, L2 regularization, ElasticNet . Join the regularization term benefits:

  • The magnitude of the control parameter, not the model. "Lawlessness."
  • Limit parameter search space
  • Underfitting solve the problem of over-fitting.

5.1 What is the L2 regularization (ridge regression)

equation:

J0 represents the above loss function, w is added on the basis of parameters of the loss function is multiplied by the square of lambda, assuming:

I recall previously learned of the unit element of the equation:

N and L2 as regularization term, then we obtain the minimum value of the task into a solution of J taken at L constraints. The process of solving J0 can draw contours. Meanwhile L2 regularization function L may be drawn on the two-dimensional plane w1w2. As shown below:

image.png

L represents a black circle in FIG, with the approximation of the gradient descent method, and the first intersection is a circle, and this is difficult to occur at the intersection of the axes. This explains L2 regularization sparse matrix is ​​not readily available, and in order to obtain the minimum value of the loss function, such that w1 and w2 infinitely close to zero, to prevent the problem of over-fitting.

With L2 regularization under what scenario 5.2

As long as the data is linearly correlated with LinearRegression fit is not very good, need regularization , consider using ridge regression (L2), how high the input feature dimensions, and that is the case, it is not appropriate ridge regression sparse linear relationship, consider use Lasso regression.

5.3 What is the L1 regularization (Lasso regression)

L1 and L2 Regularization Regularization difference is the penalty term:

The process of solving J0 can draw contours. Meanwhile L1 regularization function may be drawn on the two-dimensional plane w1w2. As shown below:

image.png

FIG penalty term is represented as the black prism, with the approximation of the gradient descent method, and the first intersection is prismatic, and this is prone to the intersection axis. This explains L1 regularization easy to get sparse matrix.

5.4 What scene use L1 regularization

L1 regularization (Lasso regression) may cause some of the features coefficient becomes smaller, and even make some small absolute coefficient becomes 0 directly , thereby enhancing the generalization ability of the model. For high-feature data, particularly linear relationship is sparse on the use of L1 regularization (Lasso regression), or to identify the main features in a pile of features inside, then L1 regularization (Lasso regression) is the first choice of .

5.5 What is ElasticNet return

ElasticNet combination of L1 and L2 regularization term regularization term , it is the following formula:

5.6 ElasticNet return usage scenarios

ElasticNet we found with Lasso return too (too many features are sparse 0), while the ridge regression also regularized enough (regression coefficient attenuation is too slow), you can consider using ElasticNet comprehensive regression, get better results .

6. Linear regression requires the dependent variable is normally distributed?

We assume that the noise with mean linear regression for normally distributed zero. When the noise is a normal distribution N (0, delta ^ 2), the dependent variable with normal distribution N (ax (i) + b, delta ^ 2), wherein a prediction function y = ax (i) + b. This conclusion can be derived from the normal distribution of the probability density function. That is when the noise normally distributed dependent variable which must also follow a normal distribution.

Before fitting the data with a linear regression model, the first requirement data should meet or approximate normal distribution, or get fit function incorrectly.


Author: @mantchs

GitHub:https://github.com/NLP-LOVE/ML-NLP

Welcome to join the discussion! Improve joint project! Group number: [541,954,936]

Guess you like

Origin www.cnblogs.com/mantch/p/11141064.html