初识线性回归

三种线性回归方式对比

一、EXCEL

①:选取20组数据:

回归方程为y=4.128x-152.23
R²=0.3254
②选取200组数据
在这里插入图片描述
回归方程为y=3.4317x-105.96
R²=0.31
③选取2000组数据
在这里插入图片描述
回归方程为y=2.9555x-73.661
R²=0.2483

二、用jupyter编程(最小二乘法)

代码:


import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
points = np.genfromtxt("F:/Python/mydata.csv",delimiter=",",encoding='utf-8')
x=points[0:20,1];
y=points[0:20,2];
x_mean = np.mean(x)
y_mean = np.mean(y)
xsize = x.size
zi = (x * y).sum() - xsize * x_mean *y_mean
mu = (x ** 2).sum() - xsize * x_mean ** 2
# 参数a b
a = zi / mu
b = y_mean - a * x_mean
# 这里对参数保留两位有效数字
a = np.around(a,decimals=2)
b = np.around(b,decimals=2)
print('Equation of linear regression: y = {a}x + {b}')
y1 = a*x + b
plt.scatter(x,y)
plt.plot(x,y1,c='r')


20(200、2000)数值直接替换
回归方程分别为:
20:y = 4.13x + -152.23
200:y = 3.43x + -105.96
2000: y = 2.96x + -73.66

三、用jupyter编程,借助skleran库

源代码:

#导入sklearn等各种包以及数据文件
import pandas as pd
import numpy as np
from numpy import array
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
import math
df = pd.read_excel("D:/Python/mydata.xls")

#输出文件大小
df.shape

#20/200/2000组数据
x=array(df[['Height']].values[:20,:])
y=array(df[['Weight']].values[:20,:])

# x=array(df[['Height']].values[:200,:])
# y=array(df[['Weight']].values[:200,:])

# x=array(df[['Height']].values[:2000,:])
# y=array(df[['Weight']].values[:2000,:])

#导入线性回归函数
model = LinearRegression()
model.fit(x,y)

#斜率
a=model.coef_
print("斜率=",model.coef_)

#截距
b=model.intercept_
print("截距=",model.intercept_)

#输出线性回归方程
y_hat=a*x+b
print("线性回归方程:y=",a,"x",b)
print("R2=",model.score(x,y))

#绘图
plt.figure()
plt.scatter(x,y)
plt.plot(x,y_hat,color='r') #绘制直线
plt.show()

20:
在这里插入图片描述
200:
在这里插入图片描述
2000:
在这里插入图片描述

猜你喜欢

转载自blog.csdn.net/changlingMYlove/article/details/120537067