思路：

1、从0~10，生成等间距20个数作为x，

2、利用回归公式 y=5 + 2x + $\varepsilon$

3、计算y值

4、对数据进行估计

#生成从0到10之间选20个等间距的数
import numpy as np
import matplotlib.pyplot as plt
import statsmodels.api as sm
nsample = 20
#从0到10之间选20个等间距的数
x=np.linspace(0,10,nsample)
x

array([ 0.        ,  0.52631579,  1.05263158,  1.57894737,  2.10526316,
        2.63157895,  3.15789474,  3.68421053,  4.21052632,  4.73684211,
        5.26315789,  5.78947368,  6.31578947,  6.84210526,  7.36842105,
        7.89473684,  8.42105263,  8.94736842,  9.47368421, 10.        ])

#使用最小二乘法，需要在数组的前面添加一列 1，目的是与常数项组合
X=sm.add_constant(x)
X

array([[ 1.        ,  0.        ],
       [ 1.        ,  0.52631579],
       [ 1.        ,  1.05263158],
       [ 1.        ,  1.57894737],
       [ 1.        ,  2.10526316],
       [ 1.        ,  2.63157895],
       [ 1.        ,  3.15789474],
       [ 1.        ,  3.68421053],
       [ 1.        ,  4.21052632],
       [ 1.        ,  4.73684211],
       [ 1.        ,  5.26315789],
       [ 1.        ,  5.78947368],
       [ 1.        ,  6.31578947],
       [ 1.        ,  6.84210526],
       [ 1.        ,  7.36842105],
       [ 1.        ,  7.89473684],
       [ 1.        ,  8.42105263],
       [ 1.        ,  8.94736842],
       [ 1.        ,  9.47368421],
       [ 1.        , 10.        ]])

#构造y值，β0=2，β1=5
bate = np.array([2,5])
bate

array([2, 5])

#设计误差数据，构造高斯分布
e=np.random.normal(size=nsample)
e

array([-0.08130226, -0.99898515, -0.46717904, -0.52487297, -0.85998302,
        1.00102852,  0.61557834,  0.4359724 ,  1.36966089, -0.17069984,
        0.33877027, -1.602145  , -0.1940928 ,  1.58914167, -2.09103106,
       -0.87802483, -0.46069062, -2.32511203, -1.42386623, -0.22494043])

#实际值，y=β0 + x*β1 + e，构造出来的用于测试的真实值
y=np.dot(X,bate)+e
y

array([ 1.91869774,  3.6325938 ,  6.79597886,  9.36986387, 11.66633277,
       16.15892325, 18.40505202, 20.85702504, 24.42229247, 25.51351069,
       28.65455974, 29.34522342, 33.38485457, 37.79966799, 36.75107421,
       40.59565938, 43.64457254, 44.41173008, 47.94455482, 51.77505957])

数据构造完毕，计算回归方程

#最小二乘法
model=sm.OLS(y,X)

#拟合数据
res=model.fit()

#回归系数，即β0、β2
res.params

array([2.15061173, 4.90034992])

#查看全部评估结果数据
res.summary()

OLS Regression Results
Dep. Variable:	y	R-squared:	0.996
Model:	OLS	Adj. R-squared:	0.995
Method:	Least Squares	F-statistic:	4072.
Date:	Thu, 13 Sep 2018	Prob (F-statistic):	1.15e-22
Time:	10:44:47	Log-Likelihood:	-28.152
No. Observations:	20	AIC:	60.30
Df Residuals:	18	BIC:	62.30
Df Model:	1
Covariance Type:	nonrobust

	coef	std err	t	P>\|t\|	[0.025	0.975]
const	2.1506	0.449	4.788	0.000	1.207	3.094
x1	4.9003	0.077	63.815	0.000	4.739	5.062

Omnibus:	0.468	Durbin-Watson:	1.957
Prob(Omnibus):	0.791	Jarque-Bera (JB):	0.572
Skew:	0.274	Prob(JB):	0.751
Kurtosis:	2.378	Cond. No.	11.5

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

#拟合估计值
y_=res.fittedvalues
y_

array([ 2.15061173,  4.72974327,  7.30887481,  9.88800634, 12.46713788,
       15.04626942, 17.62540096, 20.2045325 , 22.78366403, 25.36279557,
       27.94192711, 30.52105865, 33.10019019, 35.67932172, 38.25845326,
       40.8375848 , 43.41671634, 45.99584788, 48.57497942, 51.15411095])

#绘图
fig,ax=plt.subplots(figsize=(8,6))
ax.plot(x,y,'o',label='data')#原始数据
ax.plot(x,y_,'r--',label='test')#拟合数据
ax.legend(loc='best')
plt.show()

35.一元线性回归-python

数据构造完毕，计算回归方程

猜你喜欢