Data Mining Classification Algorithms - Regression Extended Exercise

1) Please match your regression coefficients with each feature name one by one, the result can be in any form (dataframe, array, list, dict...)

 

from  sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.datasets import fetch_california_housing
import pandas as pd
import numpy as np
# 准备工作
housevalue = fetch_california_housing()
print(housevalue) #预览数据
x= pd.DataFrame(housevalue.data) #将数据存为dataframe结构
print('数据维度',x.shape) #查看维度

y= housevalue.target
# print(y)
print(housevalue.feature_names)
x.columns=housevalue.feature_names
print(x)
# 将特证名赋给x的列作为列名



pd.concat([pd.DataFrame(housevalue.data),pd.DataFrame(housevalue.target)],axis=1)
Xtrain,Xtest,Ytrain,Ytest = train_test_split(housevalue.data,housevalue.target,test_size=0.2,random_state=30)
# 划分训练集、测试集
print(Xtest)
print(Xtrain)

clf_reg=LinearRegression()
clf_reg.fit(Xtrain,Ytrain)
Ypred = clf_reg.predict(Xtest)
print(Ypred)
print('x的列名',x.columns)
print('查看系数',clf_reg.coef_)

print('查看线性回归方程截距项',clf_reg.intercept_)
pd.DataFrame(x.columns,clf_reg.coef_)
# 将特征列和系数都存为dataframe表结构
pd1=pd.concat([pd.DataFrame(housevalue.feature_names),pd.DataFrame(clf_reg.coef_)],axis=1)
# 连接两张表
pd1.columns=list('01')
# 先分配列名,避免后面重命名出现重复
pd2= pd1.rename(columns={'0':'feature_name','1':'回归系数'})
# 给表头重命名
print('方法一\n',pd2)

pd3 = np.c_[x.columns,clf_reg.coef_]
print('方法二\n',pd3)

pd4 = list(zip(x.columns,clf_reg.coef_))
print('方法三\n',pd4)

Screenshot of the result:

 2) Use the metrics module to calculate the absolute mean error MAE (mean_absolute_error).

Screenshots of code and results:

from sklearn import metrics
mae = metrics.mean_absolute_error(Ytest,Ypred)
print("利用metrocs计算MAE\n",mae)

Screenshot of the result:

 Note: When calculating R², it can be implemented with sklearn.metrics.r2_score() or .score()

sklearn.metrics.r2_score(): The input content is the real label and the predicted label. Based on the input, one of the two arrays is the real value, that is, the real test set, and the other is the predicted value. The calculation principle is to compare the real test value label with the prediction result of Xtest calculated by the predict() function, according to the formula

Compute the correlation coefficient to get the difference between the two.

.score(): The input content is the test set X and the label of the test set Y. After calling this method with the LinearRegression() model, first calculate the fitting degree of Xtest and Ytest according to the scoring principle of the method.

 

Guess you like

Origin blog.csdn.net/m0_52051577/article/details/130124026