Principle of Linear Regression Algorithm

The least squares method for unary linear regression

1. Unary linear regression

Preliminary knowledge

(1) Variance

Variance is used to measure the degree of sample dispersion. If the samples are all equal, then the variance is 0. The smaller the variance, the more concentrated the sample, otherwise the more dispersed the sample.

The variance calculation formula is as follows:

 

python application:

print(np.var([6, 8, 10, 14, 18], ddof=1))

There is a var method in Numpy that can directly calculate the variance. The ddof parameter is the Bessel (unbiased estimate) correction coefficient (Bessel's correction), which is set to 1 to obtain an unbiased estimator of the sample variance.

 (2) Covariance

Covariance represents the overall trend of change of two variables. If the trend of the two variables is consistent, that is, if one of them is greater than its expected value and the other is greater than its own expected value, then the covariance between the two variables is positive. If two variables have opposite trends, that is, one of them is greater than its expected value and the other is less than its own expected value, then the covariance between the two variables is negative. If the two variables are uncorrelated, the covariance is 0, and the linear independence of the variables does not mean that there must be no other correlations.

The covariance formula is as follows:

 

python application:

print(np.cov([6, 8, 10, 14, 18], [7, 9, 13, 17.5, 18])[0][1])

Univariate Linear Regression Formula

 

 Second, the least squares method to solve multiple linear regression

Multiple Linear Regression Formula

 

 Written in matrix form as follows:

 

It can be done using python through Numpy's matrix operations: 

 Numpy also provides the least squares function to implement this process:

3. Polynomial regression (no longer linear regression)

In the above example, we assumed that the relationship between the explanatory variable and the response variable is linear. This is not necessarily the case. Below we use polynomial regression, a special type of multiple linear regression that adds an exponential term (with a degree greater than 1). The curvilinear relationship in the real world is realized by adding polynomials, and its implementation is similar to multiple linear regression.

 case:

import numpy as np from sklearn.linear_model 
import LinearRegression from sklearn.preprocessing 
import PolynomialFeatures 
X_train = [[6], [8], [10], [14], [18]] 
y_train = [[7], [9], [13], [17.5], [18]] 
X_test = [[6], [8], [11], [16]] 
y_test = [[8], [12], [15], [18]] 
regressor = LinearRegression() 
regressor.fit(X_train, y_train) 
xx = np.linspace(0, 26, 100) 
yy = regressor.predict(xx.reshape(xx.shape[0], 1)) 
plt = runplt() 
plt.plot(X_train, y_train, 'k.') 
plt.plot(xx, yy) 
quadratic_featurizer = PolynomialFeatures(degree=2) 
X_train_quadratic = quadratic_featurizer.fit_transform(X_train) 
X_test_quadratic = quadratic_featurizer.transform(X_test) 
regressor_quadratic = LinearRegression()
regressor_quadratic.fit(X_train_quadratic, y_train) 
xx_quadratic = quadratic_featurizer.transform(xx.reshape(xx.shape[0], 1)) 
plt.plot(xx, regressor_quadratic.predict(xx_quadratic), 'r-') 
plt.show() 
print(X_train) 
print(X_train_quadratic) 
print(X_test) 
print(X_test_quadratic) 
print('一元线性回归 r-squared', regressor.score(X_test, y_test)) 
print('二次回归 r-squared', regressor_quadratic.score(X_test_quadratic, y_ test))

4. Optimization

The previous calculation can be understood as minimizing the cost function, and the cost function is calculated as follows:

 

 Here is the explanatory variable matrix. When there are many variables (tens of thousands), the amount of calculation will be very large. In addition, if the determinant of is 0, that is, a singular matrix, then it is impossible to find the inverse matrix. Here we introduce another method of parameter estimation, gradient descent. The fitting goal has not changed, and we still use the cost function minimization for parameter estimation.

Gradient descent method to fit the model

Stochastic gradient descent (SGD)

Below we use the SGDRegressor class of scikit-learn to calculate the model parameters. It can fit linear models by optimizing different cost functions, the default cost function is the residual sum of squares.

The cost function formed by the sum of squared residuals is a convex function, so the gradient descent method can find the global minimum.

case:

import numpy as np 
from sklearn.datasets import load_boston 
from sklearn.linear_model import SGDRegressor 
from sklearn.cross_validation import cross_val_score 
from sklearn.preprocessing import StandardScaler 
from sklearn.cross_validation import train_test_split 
data = load_boston() 
X_train, X_test, y_train, y_test = train_test_split(data.data, data.targe t)
X_scaler = StandardScaler() 
y_scaler = StandardScaler() 
X_train = X_scaler.fit_transform(X_train) 
y_train = y_scaler.fit_transform(y_train) 
X_test = X_scaler.transform(X_test) 
y_test = y_scaler.transform(y_test)
regressor = SGDRegressor(loss='squared_loss') 
scores = cross_val_score(regressor, X_train, y_train, cv=5) 
print('交叉验证R方值:', scores) 
print('交叉验证R方均值:', np.mean(scores)) 
regressor.fit_transform(X_train, y_train) 
print('测试集R方值:', regressor.score(X_test, y_test))

Ridge returns

Ridge regression addresses some of the problems of ordinary least squares by imposing a penalty on the magnitude of the coefficients. The ridge coefficient minimizes the residual sum of squares with a penalty

 

 

Guess you like

Origin blog.csdn.net/stephon_100/article/details/126059810