Detailed Multiple Linear Regression

I am a rookie who is getting started, I hope to record what I have learned like taking notes, and I also hope to help people who are also getting started.

Table of contents

1. Problem description

2. Problem Analysis

3. Solve the problem - find w and b

1. Vector form conversion

2. Goal style

3. The derivative is 0 to draw a conclusion

4. The final model result

4. Hidden problems - may not be full rank matrix

5. The hidden problem solution - regularization

1. L1 regularization - Lasso regression

2. L2 regularization - ridge regression

6. Changes and applications of linear regression

Seven, python implementation

1. Multiple Linear Regression

2. Ridge regression

3. lasso regression

Eight, linear model - regression problem classification problem


1. Problem description

We now have a data set D at hand: each sample is described by d attributes, that is  \boldsymbol{x} = (x_{1};x_{2};...;x_{d}), where x_{i}is the value of sample x on the i-th attribute. And x_{i}the final corresponding result value of each sample is y_{i}.

Now comes a new sample \boldsymbol{x}_{j}and want to know its result valuey_{j}


2. Problem Analysis

We need to find a linear model to predict according to the data set D \boldsymbol{x}_{j}, y_{j}that is, to find f(\boldsymbol{x}_{i})=w^{T}\boldsymbol{x}_{i}+bthe appropriate w and b.


3. Solve the problem - find w and b

We can use the method of least squares to solve this problem.

1. Vector form conversion

First, combine w and b into a vector form \hat{\boldsymbol{w}} = (w;b)with a size of (d+1)*1;

Then rewrite the data matrix X: X= \begin{pmatrix} \boldsymbol{x}_{1} & \boldsymbol{x}_{2}&...&\boldsymbol{x}_{m}\\ 1 & 1 & ...& 1\end{pmatrix}, the size is (d+1)*m.

Then write the vector pattern for the label y as well: \boldsymbol{y} = (y_{1};y_{2};...;y_{m})

2. Goal style

 \underset{\hat{w}}{arg min}(\boldsymbol{y}-\hat{w}^{T}X)(\boldsymbol{y}-\hat{w}^{T}X)^{T}

make E = (\boldsymbol{y}-\hat{w}^{T}X)(\boldsymbol{y}-\hat{w}^{T}X)^{T}

3. The derivative is 0 to draw a conclusion

\frac{\partial E}{\partial \hat{w}} = 2X(X^{T}\hat{w}-\boldsymbol{y})=0\rightarrow \hat{w}=(XX^{T})^{-1}X\boldsymbol{y}  

4. The final model result

\hat{\boldsymbol{x}_{i}} = (\boldsymbol{x}_{i};1)

f(\hat{\boldsymbol{x}_{i}}) = \hat{\boldsymbol{x}_{i}}(XX^{T})^{-1}X\boldsymbol{y}


4. Hidden problems - XX^{T}may not be full rank matrix

XX^{T}It may not be a full-rank matrix, and multiple \hat{w}optimal solutions will be generated. Which solution should be selected as\hat{w}

For example, if the number of samples is small, the number of feature attributes is large, or even exceeds the number of samples, then it is not a full-rank matrix at this time , and multiple solutions XX^{T}can be solved .\hat{w}


5. The hidden problem solution - regularization

The role of regularization is to choose a model with small empirical risk and model complexity at the same time

1. L1 regularization - Lasso regression

Add an item after the objective function \lambda \sum_{i=1}^{d}|w_{i}|

Then, the objective function becomes \underset{\hat{w}}{arg min}((\boldsymbol{y}-\hat{w}^{T}X)(\boldsymbol{y}-\hat{w}^{T}X)^{T}+\lambda \sum_{i=1}^{d}|w_{i}|)

The first item is the empirical risk mentioned above, and the second item controls the complexity of the model.

Among them \lambda >0, the degree of punishment is controlled: \lambda \rightarrow \infty ,\hat{w}\rightarrow 0;\lambda \rightarrow 0 ,\hat{w}\rightarrow (XX^{T})^{-1}X\boldsymbol{y}

This is also known as Lasso regression.

As shown in the figure below (assuming there are only two attributes): the L1 regularized squared error term contour and the regularized contour often intersect on the coordinate axis, which means that one of the attributes is discarded, reflecting the feature selection. characteristics, it is easier to get a sparse solution (compared to the L2 regularization below) - that is, there will be fewer non-zero values ​​in the obtained W vector.

2. L2 regularization - ridge regression

Add an item after the objective function\lambda \sum_{i=1}^{d}w_{i}^{2}

Then, the objective function becomes \underset{\hat{w}}{arg min}((\boldsymbol{y}-\hat{w}^{T}X)(\boldsymbol{y}-\hat{w}^{T}X)^{T}+\lambda \sum_{i=1}^{d}w_{i}^{2})

This is also known as ridge regression.

L2 regularization selects parameters evenly, so that the coefficients of the fitting curve are similar. Although the number of items cannot be reduced, the coefficients are balanced. This is different from L1 regularization in principle.


6. Changes and applications of linear regression

If the model for the problem is not linear regression, try approximating the model predictions to derivatives of y.

For example - log-linear regression lny_{i}=w^{T}\boldsymbol{x}_{i}+b

More generally: y=g^{-1}(w^{T}\boldsymbol{\boldsymbol{x}}+b), which is called a generalized linear model.


Seven, python implementation

1. Multiple Linear Regression

from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
X_train,X_test,Y_train,Y_test=train_test_split(x,y,test_size=0.3,random_state=1)//x,y分别为已经分好的属性数据和标记数据
model = LinearRegression()
model.fit(X_train, Y_train)
score = model.score(X_test, Y_test)
print('模型测试得分:'+str(score))
Y_pred = model.predict(X_test)
print(Y_pred)

2. Ridge regression

from sklearn.linear_model import Ridge
from sklearn.model_selection import train_test_split
X_train,X_test,Y_train,Y_test=train_test_split(x,y,test_size=0.3,random_state=1)//x,y分别为已经分好的属性数据和标记数据
model = Ridge(alpha=1)
model.fit(X_train, Y_train)
score = model.score(X_test, Y_test)
print('模型测试得分:'+str(score))
Y_pred = model.predict(X_test)
print(Y_pred)

3. lasso regression

from sklearn.linear_model import Lasso
from sklearn.model_selection import train_test_split
X_train,X_test,Y_train,Y_test=train_test_split(x,y,test_size=0.3,random_state=1)//x,y分别为已经分好的属性数据和标记数据
model = Lasso(alpha=0.1)
model.fit(X_train, Y_train)
score = model.score(X_test, Y_test)
print('模型测试得分:'+str(score))
Y_pred = model.predict(X_test)
print(Y_pred)

Eight, linear model - regression problem \rightarrowclassification problem

All the above mentioned are using linear models to solve regression problems. In fact, linear models can also be used to solve classification problems - logistic regression (logarithmic probability regression).

For details, see Logistic Regression (Logistic Regression)


Everyone is welcome to criticize and correct in the comment area, thank you~

Guess you like

Origin blog.csdn.net/weixin_55073640/article/details/125807369