[Gradient descent applied to Boston house price prediction (ridge regression)]

data preparation

First, we need to obtain the Boston house price dataset and process the data. We obtain data from the CMU Statistical Learning Dataset repository and divide it into training and testing sets.

import pandas as pd
import numpy as np

data_url = "http://lib.stat.cmu.edu/datasets/boston"
raw_df = pd.read_csv(data_url, sep="\s+", skiprows=22, header=None)
data = np.hstack([raw_df.values[::2, :], raw_df.values[1::2, :2]])
target = raw_df.values[1::2, 2]

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(data, target)

Then, we normalize the data to ensure that the features are on the same scale. This is to avoid unreasonable changes in feature weights during model training.

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

Ridge regression

Ridge regression is a regularization method of linear regression, which limits the model parameters by adding a penalty term of the L2 norm to the loss function. We will use ridge regression as an optimization method for gradient descent.

from sklearn.linear_model import Ridge

ridge = Ridge()
ridge.fit(X_train, y_train)

After the training is complete, we can view the parameters and intercept of the ridge regression model and evaluate the performance of the model on the training and test sets.

print("岭回归模型参数:{}".format(ridge.coef_))
print("岭回归模型截距:{}".format(ridge.intercept_))
print("岭回归模型训练集得分:{:.2f}".format(ridge.score(X_train, y_train)))
print("岭回归模型测试集得分:{:.2f}".format(ridge.score(X_test, y_test)))

model evaluation

We use Mean Squared Error (Mean Squared Error, MSE) to evaluate the performance of the model, and the smaller the MSE, the better the model fitting effect.

from sklearn.metrics import mean_squared_error

y_pred = ridge.predict(X_test)
print("岭回归模型均方误差:{:.2f}".format(mean_squared_error(y_test, y_pred)))

Visualization

Finally, we use the visualization tool matplotlib to display the real and predicted housing prices of the test set, and visually observe the prediction effect of the model.

import matplotlib.pyplot as plt

plt.figure(figsize=(10, 6))
plt.plot(range(len(y_test)), y_test, "r", label="y_test")
plt.plot(range(len(y_pred)), y_pred, "b--", label="y_pred")
plt.legend()
plt.show()

In the figure, the red curve represents the real house price in the test set, and the blue dotted line represents the house price prediction result of the ridge regression model for the test set. We can visually see whether the forecast results are consistent with the real house price trend.

Guess you like

Origin blog.csdn.net/qq_66726657/article/details/131969687