Linear regression - gradient descent -diabetes

Data set: https://scikit-learn.org/stable/datasets/

feature:

age Age

sex sex

bmi body mass index

bp BP

s1, s2, s3, s4, s4, s6 six kinds of serum assay data

label:

Quantitative indicators of disease progression value after one year

 

First, load the library

import numpy as np
import pandas as pd
from sklearn import datasets
from sklearn.linear_model import SGDRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score
import matplotlib as mpl
import matplotlib.pyplot as plt
# 设置字体为黑体,以支持中文显示。
mpl.rcParams["font.family"] = "SimHei"
# 设置在中文字体时,能够正常的显示负号(-)。
mpl.rcParams["axes.unicode_minus"] = False

Second, data preprocessing

# 加载数据集
data = pd.read_csv(r"diabetes.csv",header=0)
#data.sample(30)
#data.info()
# 查看是否含有异常值
#data.describe()
# 检查是否包含重复值
#data.duplicated().any()
# 如果有重复值,可以这样去除重复值
# data.drop_duplicates(inplace=True)

Third, call the method

# 将加载的数据集分为特征X与标签y。
X, y = data.iloc[:, :-1], data.iloc[:, -1]
#通过train_test_splil将数据分为训练集、测试集,测试集占0.25的比例
train_X, test_X, train_y, test_y = train_test_split(X, y, test_size=0.25, random_state=1)
#display(len(train_y))
#display(len(test_y))

#实例化梯度下降法线性回归模型
sgd = SGDRegressor(loss='squared_loss',penalty='l1',alpha=0.01,max_iter=10000)
#训练模型
sgd.fit(train_X,train_y)
#传入测试集进行测试
result=sgd.predict(test_X)
#对模型进行评估

display(result)
display(test_y.values)
# 均方误差
print("Mean squared error: %.2f" % mean_squared_error(test_y, result))
# 方差分数: 1代表完美预测
print('Variance score: %.2f' % r2_score(test_y, result))

Fourth, the results visualization

plt.figure(figsize=(15, 10))
# 绘制预测值
plt.plot(result, "ro-", label="预测值")
# 绘制真实值
plt.plot(test_y.values, "go--", label="真实值")
plt.title("线性回归预测-梯度下降")
plt.xlabel("样本序号")
plt.ylabel("一年后疾病值")
plt.legend(loc="best")
plt.show()

 

Guess you like

Origin blog.csdn.net/weixin_42295205/article/details/91619944