Linear regression - prediction of the number of theatergoers (multivariable linear regression)

Experiment content: Prediction of the number of movie theater viewers

Experimental requirements

1. Read the data set file in the given file.

2. Draw a scatter plot between the number of movie theater viewers (filmnum) and the theater area (filmsize).

3. Draw the scatter plot matrix of the cinema attendance data set.

4. Select characteristic variables and corresponding variables, and divide the data.

5. Perform linear regression model training.

6. Predict the test set based on the calculated parameters.

7. Draw a comparison between the actual value and the predicted value of the corresponding variable in the test set.

8. Evaluate the prediction results.

Complete code

#导入包
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression


#1.读取给定文件中数据集文件。(数据集路径:data/data72160/1_film.csv)
film=pd.read_csv("E:/python/SpyderCode/work/1_film.csv")
#查看数据
X=film.iloc[0:5,:]
print(X)
filmpd=pd.DataFrame(film)
print(film.shape)


#2.绘制影厅观影人数(filmnum)与影厅面积(filmsize)的散点图。
X=film.iloc[:,0:1]
y=film.iloc[:,1:2]

plt.figure(figsize=(10,6))
plt.scatter(X,y)#绘制散点图
plt.xlabel(u'filmnum')
plt.ylabel(u'filmsize')
plt.title(u'The relation of filmnum and filmsize')#标题
plt.show()



#3.绘制影厅人数数据集的散点图矩阵。
import seaborn as sns
cols=['filmnum']
sns.pairplot(film[cols],kind='reg',size=2.5)
plt.tight_layout()
plt.show()



#4.选取特征变量与相应变量,并进行数据划分。
X=film.iloc[:,1:4]
y=film.filmnum
X=np.array(X.values)
y=np.array(y.values)
train_X,test_X,train_y,test_y=train_test_split(X,y,test_size=0.25)
print(train_X.shape)


#5.进行线性回归模型训练。
model=LinearRegression()
model.fit(train_X,train_y)#模型训练



#6.根据求出的参数对测试集进行预测。
print("Coifficient:",model.coef_)#系数
print("求解截距顶是:",model.intercept_)
pre_y=model.predict(test_X)
print("根据测试及X预测的y集是:\n",pre_y)



#7.绘制测试集相应变量实际值与预测值的比较。
plt.figure(figsize=(10,6))
t=np.arange(len(test_X))#创建t变量

plt.plot(t,test_y,linewidth=2,label="test_y",color="red")
plt.plot(t,pre_y,linewidth=2,label="pre_y",color="blue")

plt.legend()#设置图例

plt.xlabel("test data")
plt.ylabel("filmnum")
plt.show()

Guess you like

Origin blog.csdn.net/weixin_48434899/article/details/123827544