Full analysis of linear regression: from basic theory to Python implementation

Overview

Linear Regression is the cornerstone of statistics and machine learning. From the simplest single-variable linear regression to the more complex multi-variable linear regression, this method provides us with a powerful tool. Can understand relationships in data and make educated predictions.

Machine Learning Linear Regression
In this article, Xiaobai will lead you to introduce the basic theory of regression in detail and demonstrate it through actual cases in Python. Whether you are a beginner in machine learning or want to have a deeper understanding of linear regression, this article will provide insights and guidance.

What is linear regression?

Linear Regression is one of the most widely used and well-known algorithms in the field of machine learning. Linear regression is an estimate A method of continuously pulling the relationship between you. For example, holiday prediction, if we guide past holidays and various factors related to them (house area, construction age, etc.), we can use linear regression to predict future housing prices .

Key concepts

dependent variable

Dependent Variable The dependent variable is the variable we want to predict and is the main target we care about.

For example, if we want to predict housing prices, then housing prices are the dependent variables (Dependent Variable). We need to predict through factors (features) (house area, construction year, etc.).

The right dependent variable is the first step in any regression analysis, ensuring we know what our goals are and how to measure success.

independent variable

The independent variable is the variable we use to predict the dependent variable. The independent variable affects the dependent variable, but is not affected by the dependent variable.

For example: In the example of predicting housing prices, the area of ​​the house, location, proximity to schools, year of construction, etc. are all independent variables (Independent Variable). These are all factors that may affect housing prices.

Choosing the correct independent variables is key to ensuring the effectiveness of a linear regression model. If we choose irrelevant or redundant independent variables, the model will perform poorly and it will be harder to interpret the results of the model.

linear relationship

The core idea of ​​linear regression is to use the best-fitting straight line to describe the relationship between the independent variable and the dependent variable. This line is also called the regression line. In linear regression, we assume that the relationship between the independent variable and the dependent variable The relationship is linear, which means that changing the independent variable causes a linear change in the dependent variable.

Finding a straight line that describes the distribution of data points is critical to predicting and explaining relationships. This line, called the line of best fit, minimizes the distance of each data point from the straight line. These distances are called biases. ).

Formula:
y = w x + b y = wx + b and=wx+b

Here is a simple case:

Data: Salary and Age (2 features)
Goal: Predict how much money the bank will lend me (label)

salary age Quota
4000 25 20000
8000 30 70000
5000 28 35000
7500 33 50000
12000 40 85000

Both salary and age affect the final bank loan outcome. So how much impact do they each have?

X1, X2 represent our two characteristics: age and salary. Y represents how much money the bank will eventually lend us.

Find the most appropriate line (imagine a higher dimension) that best fits our data points. As shown below:

linear regression

basic linear equations

expression comprehension

y = w 0 + w 1 x 1 + w 2 x 2 y =w_0 + w_1x_1 + w_2x_2and=In0+In1x1+In2x2

  • w 0 w_0 In0is the offset top, equivalent to C
  • x1, x2 are independent variables (Dependent Variable), which are factors used for prediction
  • and andy is the dependent variable (Independent Variable), that is, the value we want to predict
  • w 1 w_1 In1, w 2 w_2 In2is the slope, which represents the change in the expected value of y when x1 and x2 increase by one unit.

Linear model, vector w value in. Objectively expresses the importance of each attribute in prediction, so linear model has good interpretability. For this kind of "multi-feature prediction" that is (multiple linear regression), then linear Regression is to obtain these w values ​​on this basis. Then use these values ​​to build a model and pre-test the data. Simply put, it is to learn a linear model to predict the actual output label as accurately as possible.

Then if for multi-variable linear regression we can express the relationship between the θ value and the feature X value through a vector:
Insert image description here
Multiply the two vectors, and the result is one The integers are estimates. where the first eigenvalue of all feature sets x 0 = 1 x_0=1 x0=1, then we can express the sexual model through a general vector formula:

h θ ( x ) = ∑ i = 0 n θ i x i = θ T x h_\theta(x)=\sum\limits_{i=0}^{n}\theta_ix_i = \theta^Tx hθ(x)=i=0niixi=iTx

multivariable regression

In multivariable linear regression, there are multiple variables. The equation can be expressed as:
y = w 0 + w 1 x 1 + w 2 x 2 + w 3 x 3 + . . . + w n x n y = w_0 + w_1x_1 + w_2x_2 + w_3x_3 + ... + w_nx_n and=In0+In1x1+In2x2+In3x3+...+Innxn

In the above formula, we have n independent variables, and each has a corresponding coefficient (weight). This allows us to consider more influencing factors in the model, thereby improving the accuracy of predictions.

Error term and loss function

Errors

When a model does not fit all data points perfectly, it is mainly because of noise present in the data or factors affecting the observations. The error term can help us quantify this uncertainty.

Mean Squared Error (MSE)

MSE (Mean Square Error) is a key indicator to evaluate the performance of the Mo model. MSE is a measure of the difference between the predicted value and the actual value.

公式:
M S E = 1 n ∑ i = 1 n ( y i − y ˆ i ) 2 MSE = \frac{1}{n}\sum\limits_{i=1}^{n}(y_i - \^y_i)^2 MSE=n1i=1n(yiandˆi)2

  • y i y_i andi: is the actual value
  • y ˆ i \^y_iandˆi: is the predicted value

The smaller the MSE, the higher the accuracy of the model.

least squares method

What is least squares method

The least squares method (Least Squares) is a classic method in statistics and numerical analysis, and is widely used in regression analysis and curve fitting. The core idea of ​​the least squares method is to determine the position parameters based on a clear standard. Ensure that the sum of squares of the model's prediction errors on the training data is minimized.

公式:
S = ∑ i = 1 n ( y i − ( w 0 + w 1 x 1 ) ) 2 S = \sum\limits_{i=1}^{n}(y_i -(w_0 + w_1x_1))^2 S=i=1n(yi(w0+In1x1))2

The idea of ​​least squares

Consider the simplest linear regression model, the model is:
y = w 0 + w 1 x 1 y =w_0 + w_1x_1 and=In0+In1x1

Our goal is to minimize the sum of squared errors:
S = ∑ i = 1 n ( y i − ( w 0 + w 1 x 1 ) ) 2 S = \sum\ limits_{i=1}^{n}(y_i -(w_0 + w_1x_1))^2 S=i=1n(yi(w0+In1x1))2

We need to find the partial derivatives of w0 and w1 that minimize the value of S, and set them to 0. This will give us a set of equations. By solving the equations, we can get w0 and w1 respectively. These values ​​are the values ​​that minimize the squared error. value.

Detailed derivation

Find the partial derivative of w0

In order to find the partial derivative of S with respect to w0, we need to use the connection rule, the derivative of the square ( u 2 ) ′ = 2 u d u = 2 u (u^2)& #39; = 2udu = 2u (u2)=2udu=2u:

∂ S ∂ w 0 ( y i − ( w 0 + w 1 x 1 ) ) 2 = 2 ( y i − w 0 − w 1 x 1 ) d ( − w 0 ) = 2 ( y i − w 0 − w 1 x 1 ) × ( − 1 ) = 2 ( w 0 + w 1 x 1 − y i ) \frac{\partial S}{\partial w_0} (y_i -(w_0 + w_1x_1))^2\\= 2(y_i - w_0 - w_1x_1)d(-w_0) \\= 2(y_i - w_0 - w_1x_1) \times (-1) \\= 2(w_0 + w_1x_1 - y_i) w0S(yi(w0+In1x1))2=2(yiIn0In1x1)d(w0)=2(yiIn0In1x1)×(1)=2(w0+In1x1andi)

To minimize S, equal the partial derivative above to 0:

2 ∑ i = 1 n ( w 0 + w 1 x 1 − y i ) = 0 2\sum\limits_{i=1}^{n}(w_0 + w_1x_1 - y_i) = 0 2i=1n(w0+In1x1andi)=0

Model assumptions for linear regression

linear relationship

The linear regression model determines that there is a linear relationship between the dependent variable and the independent variable, that is, the relationship between the two can be multiplied by one or more coefficients (w1, w2) times the predictor variables (x1, x2) plus A constant to represent (w0). This is the basis of linear regression models, and violation of this assumption may lead to a decrease in the predictive power of the model.

The linear relationship is the origin of the name linear regression. If this assumption is not met, then the linear regression model is not a suitable choice. Because we cannot accurately capture the relationship between the dependent variable and the independent variable through a straight line.

How to determine whether it is a linear relationship:

"""
@Module Name: 通过残差图判断线性关系.py
@Author: CSDN@我是小白呀
@Date: October 17, 2023

Description:
通过残差图判断线性关系
"""
from sklearn.linear_model import LinearRegression
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

plt.style.use("fivethirtyeight")  # 设置绘图样式
plt.rcParams['font.sans-serif'] = ['SimHei']  # 设置中文字体 'SimHei'
plt.rcParams['axes.unicode_minus']=False

# 设置随机数种子
np.random.seed(0)

# 模拟线性数据 y = 3x + b
X_linear = np.random.rand(100, 1) * 10  # 100个样本, 范围0-10
y_linear = 3 * X_linear + np.random.randn(100, 1) * 2  # 线性关系 + 噪声 (0-1)

# 模拟非线性数据 y = 3x^2 + b
X_non_linear = np.random.rand(100, 1) * 10  # 100个样本, 范围0-10
y_non_linear = 3 * X_non_linear**2 + np.random.randn(100, 1) * 2  # 二次方关系 + 噪声 (0-1)


# 实例化线性回归模型
model = LinearRegression()

# 训练线性
model.fit(X_linear, y_linear)  # 训练
y_linear_pred = model.predict(X_linear)  # 预测
residuals_linear = y_linear - y_linear_pred  # 计算残差

# 训练非线性
model.fit(X_non_linear, y_non_linear)  # 训练
y_non_linear_pred = model.predict(X_non_linear)  # 预测
residuals_non_linear = y_non_linear - y_non_linear_pred  # 计算残差

# 绘制残差图
f, ax = plt.subplots(1, 2, figsize=(16, 8))
f.suptitle('残差图')

# 线性残差图
sns.residplot(ax=ax[0], x=y_linear_pred.flatten(), y=residuals_linear.flatten(), lowess=True, line_kws={'color': 'red', 'lw': 1})
ax[0].axhline(y=0, color='gray', linestyle='--')
ax[0].set_xlabel('预测值')
ax[0].set_ylabel('残差')
ax[0].set_title('线性数据残差图')

# 非线性残差图
sns.residplot(ax=ax[1], x=y_non_linear_pred.flatten(), y=residuals_non_linear.flatten(), lowess=True, line_kws={'color': 'red', 'lw': 1})
ax[1].axhline(y=0, color='gray', linestyle='--')
ax[1].set_xlabel('预测值')
ax[1].set_ylabel('残差')
ax[1].set_title('非线性数据残差图')

plt.show()

Output result:
residual plot

Residual Plot refers to a scatter plot with the residual (Residual), the difference between the target value and the predicted value, as the ordinate. Any other specified quantity as the abscissa. The left side of the above picture is a 0 mean In the white noise distribution, the residual value exists in the picture on the right. A residual diagram of a linear relationship should not contain any predictable information.

independence

The error terms should be independent, which means that the error of each data is not related to the error of other data. If the errors are not independent, then the model may lack certain information, leading to prediction bias.

How to judge independence:
Durbin-Watson test is a common method used to detect the continuity of errors in continuous observations. The range of Durbin-Watson value is 0~4, Less than 1.5 represents positive autocorrelation, greater than 1.5 represents negative autocorrelation, and between 1.5-2.5 indicates no autocorrelation.

Independence of judgment:

"""
@Module Name: 通过残差图判断独立性.py
@Author: CSDN@我是小白呀
@Date: October 17, 2023

Description:
通过残差图判断独立性
"""
import numpy as np
import statsmodels.api as sm
from statsmodels.regression.linear_model import OLS
import matplotlib.pyplot as plt
import seaborn as sns

plt.style.use("fivethirtyeight")  # 设置绘图样式
plt.rcParams['font.sans-serif'] = ['SimHei']  # 设置中文字体 'SimHei'
plt.rcParams['axes.unicode_minus']=False


# 设置随机数种子
np.random.seed(0)

# 独立残差数据
X = np.random.rand(100, 1) * 10  # 100个样本, 范围0-10
y = 3 * X + np.random.randn(100, 1) * 2  # 线性关系 + 噪声 (0-1)

# 残差正相关数据
X_positive = np.arange(100)
y_positive = np.array([1 for _ in range(100)])

# 正相关数据残差
model = OLS(y, X).fit()  # 使用 statsmodels 进行线性回归
residuals = model.resid  # 获取残差

# 独立数据残差
model = OLS(y_positive, X_positive).fit()  # 使用 statsmodels 进行线性回归
residuals_positive = model.resid  # 获取残差

# 进行Durbin-Watson测试
dw = sm.stats.durbin_watson(residuals)
dw_positive = sm.stats.durbin_watson(residuals_positive)

# 调试输出
print("独立残差数据 Durbin-Watson值:", dw)
print("正自相关残差数据 Durbin-Watson值:", dw_positive)

# 绘制残差图
f, ax = plt.subplots(1, 2, figsize=(16, 8))
f.suptitle('Durbin-Watson 自相关性 (独立性)')

# 线性残差图
sns.scatterplot(ax=ax[0], x=[i for i in range(100)], y=residuals)
ax[0].axhline(y=0, color='gray', linestyle='--')
ax[0].set_ylabel('残差')
ax[0].set_title('无自相关性')

# 非线性残差图
sns.scatterplot(ax=ax[1], x=[i for i in range(100)], y=residuals_positive)
ax[1].axhline(y=0, color='gray', linestyle='--')
ax[1].set_ylabel('残差')
ax[1].set_title('正自相关性')

plt.show()

Output result:

Please add image description

独立残差数据 Durbin-Watson值: 2.100345841035149
正自相关残差数据 Durbin-Watson值: 0.0008866112741927459

Note: Positive Autocorrelation means that the data shows a clear linear increasing or decreasing trend, that is, each data is highly correlated with the previous and previous data, rather than random noise. The picture on the left is random noise and has no autocorrelation. The Durbin-Watson value is between 1.5-2.5, and the picture on the right shows a high degree of positive autocorrelation.

homoskedasticity

Homoscedasticity means that the homoscedasticity of the model should remain constant regardless of the value of the predictor variable. Homoscedasticity provides stability and efficiency to regression analysis. When homoskedasticity is not met, the regression model may Biased towards certain observations, thereby affecting the overall prediction effect of the model. And if there is heteroskedasticity

Insert image description here
Residuals are a common tool for verifying homoskedasticity. By plotting the residual plot, we can check whether the distribution is uniform. If the points in the residual plot are randomly distributed along the 0 line, homoskedasticity holds, and the opposite is true if the points in the plot Exhibiting a certain pattern, such as spreading as the predicted value increases, is heteroskedasticity. The picture on the left of the above picture belongs to homoskedasticity, and the picture on the right belongs to heteroskedasticity.

Normally distributed error

Normality of Error refers to the error or residual of the model in linear regression that should conform to the normal distribution.

example:

"""
@Module Name: 通过残差图判断正态分布.py
@Author: CSDN@我是小白呀
@Date: October 17, 2023

Description:
通过残差图判断正态分布
"""
from sklearn.linear_model import LinearRegression
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

plt.style.use("fivethirtyeight")  # 设置绘图样式
plt.rcParams['font.sans-serif'] = ['SimHei']  # 设置中文字体 'SimHei'
plt.rcParams['axes.unicode_minus']=False

# 设置随机数种子
np.random.seed(0)

# 模拟线性数据 y = 3x + b
X_linear = np.random.rand(1000, 1) * 10  # 1000个样本, 范围0-10
y_linear = 3 * X_linear + np.random.randn(1000, 1) * 2  # 线性关系 + 噪声 (0-1)

# 模拟非线性数据 y = 3x^2 + b
X_non_linear = np.random.rand(1000, 1) * 10  # 1000个样本, 范围0-10
y_non_linear = 3 * X_non_linear**2 + np.random.randn(1000, 1) * 2  # 二次方关系 + 噪声 (0-1)


# 实例化线性回归模型
model = LinearRegression()

# 训练线性
model.fit(X_linear, y_linear)  # 训练
y_linear_pred = model.predict(X_linear)  # 预测
residuals_linear = y_linear - y_linear_pred  # 计算残差

# 训练非线性
model.fit(X_non_linear, y_non_linear)  # 训练
y_non_linear_pred = model.predict(X_non_linear)  # 预测
residuals_non_linear = y_non_linear - y_non_linear_pred  # 计算残差

# 绘制残差图
f, ax = plt.subplots(1, 2, figsize=(16, 8))
f.suptitle('正态分布')

# 最大残差
print("线性残差范围:", residuals_linear.min(), "-", residuals_linear.max())
print("非线性残差范围:", residuals_non_linear.min(), "-", residuals_non_linear.max())

# 线性残差图
sns.histplot(ax=ax[0], data=residuals_linear.flatten(), bins=20)
ax[0].set_xlabel('残差值')
ax[0].set_ylabel('数量')
ax[0].set_title('线性数据正态分布图')

# 非线性残差图
sns.histplot(ax=ax[1], data=residuals_non_linear.flatten(), bins=20)
ax[1].set_xlabel('残差值')
ax[1].set_ylabel('数量')
ax[1].set_title('非线性数据正态分布图')

plt.show()

Output result:
Please add image description

线性残差范围: -6.064887209285082 - 6.284146438819695
非线性残差范围: -30.07255606488623 - 53.966957811451316

We can see that the data residuals that conform to the linear relationship conform to the normal distribution, while the linear ones do not conform to the normal distribution.

gradient descent

Gradient Descent is an iterative method for optimizing functions. In machine learning and deep learning, we often use gradient descent to minimize (maximize) the loss function and then find the model's optimal parameters.

working principle

The core idea of ​​gradient descent is simple: find the slope (gradient) of the loss function, and then update the parameters of the model along the opposite direction of the slope to gradually reduce the loss.

Insert image description here

Gradient descent formula

公式:
w n e x t = w − l r × ∂ l o s s ∂ w w_{next} = w - lr\times \frac{\partial loss}{\partial w} Innext=Inlr×wloss

  • w n e x t w_{next} Innext: is the next weight in the gradient descent process
  • lr: Learning Rate learning rate
  • ∂ l o s s ∂ w \frac{\partial loss}{\partial w} wloss: Derivative of the error function

Downside I came here to tell you:
I knew it: M S E = 1 n ∑ i = 1 n ( y i − y ˆ i ) 2 y = w 0 + w 1 x 1 MSE = \frac{1}{n}\sum\limits_{i=1}^{n}(y_i - \^y_i)^2\\y = w_0 + w_1x_1 MSE=n1i=1n(yiandˆi)2and=In0+In1x1

In gradient descent, our goal is to adjust the model parameters (weights and biases, w and b or w0) to minimize the loss function, so we need to take partial derivatives of the model parameters (w and w0):

w1 (权重) 求导:
∂ M S E ∂ w 1 = 1 n ∑ i = 1 n ( w 0 + w 1 x 1 ) 2 d w = 2 n ∑ i = 1 n x ( w 0 + w 1 x 1 ) \frac{\partial MSE}{\partial w_1} = \frac{1}{n}\sum\limits_{i=1}^{n}(w_0 + w_1x_1)^2dw \\=\frac{2}{n}\sum\limits_{i=1}^{n}x(w_0 + w_1x_1) w1MSE=n1i=1n(w0+In1x1)2dw=n2i=1nx(w0+In1x1)

w0 (偏置) 求导:
∂ M S E ∂ w 0 = 1 n ∑ i = 1 n ( w 0 + w 1 x 1 ) 2 d w = 2 n ∑ i = 1 n ( w 0 + w 1 x 1 ) \frac{\partial MSE}{\partial w_0} = \frac{1}{n}\sum\limits_{i=1}^{n}(w_0 + w_1x_1)^2dw \\=\frac{2}{n}\sum\limits_{i=1}^{n}(w_0 + w_1x_1) w0MSE=n1i=1n(w0+In1x1)2dw=n2i=1n(w0+In1x1)

If there are w2, w3 and so on, I won’t write it here.

Calculate new weights and substitute the above derivatives into the gradient descent formula, we can get:
w 1 n e x t = w 1 − l r × ∂ l o s s ∂ w 1 = w 1 − l r × 2 n ∑ i = 1 n x ( w 0 + w 1 x 1 ) w_{1next} = w_1 - lr\times \frac{\partial loss}{\partial w_1} \\=w_1 - lr\times\frac {2}{n}\sum\limits_{i=1}^{n}x(w_0 + w_1x_1) In1next=In1lr×w1loss=In1lr×n2i=1nx(w0+In1x1)

同理:
w 0 n e x t = w 0 − l r × ∂ l o s s ∂ w 0 = w 1 − l r × 2 n ∑ i = 1 n ( w 0 + w 1 x 1 ) w_{0next} = w_0 - lr\times \frac{\partial loss}{\partial w_0} \\=w_1 - lr\times\frac{2}{n}\sum\limits_{i=1}^{n}(w_0 + w_1x_1) In0next=In0lr×w0loss=In1lr×n2i=1n(w0+In1x1)

In the code:

# 计算w的导, w的导 = 2x(wx+b-y)
 w_gradient += (2 / N) * x * ((w_current * x + b_current) - y)

# 计算b的导, b的导 = 2(wx+b-y)
b_gradient += (2 / N) * ((w_current * x + b_current) - y)

Variations of Gradient Descent

  • Batch Gradient Descent (BGD, Batch Gradient Descent): Use this training set to calculate the gradient each time
  • Stochastic Gradient Descent (SGD): only uses one sample at a time to calculate the gradient
  • Mini-Batch Gradient Descent: Use a mini-batch of samples to calculate gradients

learning rate

Find the lowest point of the valley, which is the end point of our objective function (what parameters can make the objective function reach the extreme point)

How many steps does it take to go down the mountain?

  1. Find the most appropriate direction at the moment
  2. take a small step
  3. Update our parameters according to the direction and pace

Learning rate (learning_rate): has a greater impact on the results, the smaller the better.

Data batch (batch_size): Prioritize memory and efficiency, batch size is secondary.

machine learning learning rate
Choosing an appropriate learning rate is very important:

  • Learning rate that is too small: slow convergence
  • Too large a learning rate: the minimum will be missed, or the gradient will explode

Common operations include:

  • Learning Rate Decay or Cosine Annealing, that is, the learning rate gradually decreases with the number of iterations

Linear regression implements Boston holiday prediction

"""
@Module Name: 线性回归预测波士顿房价.py
@Author: CSDN@我是小白呀
@Date: October 17, 2023

Description:
线性回归预测波士顿房价
"""
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score


# 加载数据集
boston_data = load_boston()
X = boston_data.data
y = boston_data.target

# 调试输出数据基本信息
print("输出特征:", X[:5])
print("输出标签:", y[:5])

# 分割数据集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

# 实例化模型
lin_reg = LinearRegression()

# 训练模型
lin_reg.fit(X_train, y_train)

# 预测
y_pred = lin_reg.predict(X_test)

# 模型评估
MSE = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print("Mean Squared Error (MSE):", MSE)
print("R^2 Score:", r2)

Output result:

输出特征: [[6.3200e-03 1.8000e+01 2.3100e+00 0.0000e+00 5.3800e-01 6.5750e+00
  6.5200e+01 4.0900e+00 1.0000e+00 2.9600e+02 1.5300e+01 3.9690e+02
  4.9800e+00]
 [2.7310e-02 0.0000e+00 7.0700e+00 0.0000e+00 4.6900e-01 6.4210e+00
  7.8900e+01 4.9671e+00 2.0000e+00 2.4200e+02 1.7800e+01 3.9690e+02
  9.1400e+00]
 [2.7290e-02 0.0000e+00 7.0700e+00 0.0000e+00 4.6900e-01 7.1850e+00
  6.1100e+01 4.9671e+00 2.0000e+00 2.4200e+02 1.7800e+01 3.9283e+02
  4.0300e+00]
 [3.2370e-02 0.0000e+00 2.1800e+00 0.0000e+00 4.5800e-01 6.9980e+00
  4.5800e+01 6.0622e+00 3.0000e+00 2.2200e+02 1.8700e+01 3.9463e+02
  2.9400e+00]
 [6.9050e-02 0.0000e+00 2.1800e+00 0.0000e+00 4.5800e-01 7.1470e+00
  5.4200e+01 6.0622e+00 3.0000e+00 2.2200e+02 1.8700e+01 3.9690e+02
  5.3300e+00]]
输出标签: [24.  21.6 34.7 33.4 36.2]
Mean Squared Error (MSE): 33.4489799976764
R^2 Score: 0.5892223849182523

Hand rubbing linear regression

For everyone's better understanding, we finally lead you to implement linear regression without using any package.

Code:

"""
@Module Name: 手把手实现线性回归.py
@Author: CSDN@我是小白呀
@Date: October 17, 2023

Description:
手把手实现线性回归算法
"""
class LinearRegression:
    def __init__(self, learning_rate=0.000003, num_iterations=100000):
        """
        初始化线性回归模型
        :param learning_rate: 学习率
        :param num_iterations: 迭代次数
        """
        self.learning_rate = learning_rate
        self.num_iterations = num_iterations
        self.weights = None  # 权重w
        self.bias = None  # 偏置b

    def fit(self, X, y):
        """
        训练模型
        :param X: 训练特征
        :param y: 训练标签
        :return:
        """
        num_samples, num_features = X.shape
        self.weights = [0] * num_features
        self.bias = 0

        # 梯度下降
        for _ in range(self.num_iterations):
            model_output = self._predict(X)

            # 计算梯度
            # 计算w的导, w的导 = 2x(wx+b-y)
            d_weights = (-2/num_samples) * X.T.dot(y - model_output)

            # 计算b的导, b的导 = 2(wx+b-y)
            d_bias = (-2/num_samples) * sum(y - model_output)

            # 更新梯度
            self.weights -= self.learning_rate * d_weights
            self.bias -= self.learning_rate * d_bias

    def predict(self, X):
        return self._predict(X)

    def _predict(self, X):
        return X.dot(self.weights) + self.bias


if __name__ == '__main__':
    import pandas as pd
    from sklearn.datasets import load_boston
    from sklearn.model_selection import train_test_split
    from sklearn.metrics import mean_squared_error, r2_score


    # 加载数据集
    boston_data = load_boston()
    X = boston_data.data
    y = boston_data.target

    # 查看nan
    # print(pd.DataFrame(X).isnull().sum())

    # 调试输出数据基本信息
    print("输出特征:", X[:5])
    print("输出标签:", y[:5])

    # 分割数据集
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

    # 实例化模型
    lin_reg = LinearRegression()

    # 训练模型
    lin_reg.fit(X_train, y_train)

    # 预测
    y_pred = lin_reg.predict(X_test)

    # 模型评估
    MSE = mean_squared_error(y_test, y_pred)
    r2 = r2_score(y_test, y_pred)

    print("Mean Squared Error (MSE):", MSE)
    print("R^2 Score:", r2)

Output result:

输出特征: [[6.3200e-03 1.8000e+01 2.3100e+00 0.0000e+00 5.3800e-01 6.5750e+00
  6.5200e+01 4.0900e+00 1.0000e+00 2.9600e+02 1.5300e+01 3.9690e+02
  4.9800e+00]
 [2.7310e-02 0.0000e+00 7.0700e+00 0.0000e+00 4.6900e-01 6.4210e+00
  7.8900e+01 4.9671e+00 2.0000e+00 2.4200e+02 1.7800e+01 3.9690e+02
  9.1400e+00]
 [2.7290e-02 0.0000e+00 7.0700e+00 0.0000e+00 4.6900e-01 7.1850e+00
  6.1100e+01 4.9671e+00 2.0000e+00 2.4200e+02 1.7800e+01 3.9283e+02
  4.0300e+00]
 [3.2370e-02 0.0000e+00 2.1800e+00 0.0000e+00 4.5800e-01 6.9980e+00
  4.5800e+01 6.0622e+00 3.0000e+00 2.2200e+02 1.8700e+01 3.9463e+02
  2.9400e+00]
 [6.9050e-02 0.0000e+00 2.1800e+00 0.0000e+00 4.5800e-01 7.1470e+00
  5.4200e+01 6.0622e+00 3.0000e+00 2.2200e+02 1.8700e+01 3.9690e+02
  5.3300e+00]]
输出标签: [24.  21.6 34.7 33.4 36.2]
Mean Squared Error (MSE): 43.28945498976922
R^2 Score: 0.46837424997350163

Guess you like

Origin blog.csdn.net/weixin_46274168/article/details/133881708