版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/Ericsson_Liu/article/details/82528886
代码与数据来源:Python数据分析与挖掘实战
原始数据:
日期 | 销量 |
1/1/2015 | 3023 |
1/2/2015 | 3039 |
1/3/2015 | 3056 |
1/4/2015 | 3138 |
1/5/2015 | 3188 |
1/6/2015 | 3224 |
1/7/2015 | 3226 |
1/8/2015 | 3029 |
1/9/2015 | 2859 |
1/10/2015 | 2870 |
1/11/2015 | 2910 |
1/12/2015 | 3012 |
1/13/2015 | 3142 |
1/14/2015 | 3252 |
1/15/2015 | 3342 |
1/16/2015 | 3365 |
1/17/2015 | 3339 |
1/18/2015 | 3345 |
1/19/2015 | 3421 |
1/20/2015 | 3443 |
1/21/2015 | 3428 |
1/22/2015 | 3554 |
1/23/2015 | 3615 |
1/24/2015 | 3646 |
1/25/2015 | 3614 |
1/26/2015 | 3574 |
1/27/2015 | 3635 |
1/28/2015 | 3738 |
1/29/2015 | 3707 |
1/30/2015 | 3827 |
1/31/2015 | 4039 |
2/1/2015 | 4210 |
2/2/2015 | 4493 |
2/3/2015 | 4560 |
2/4/2015 | 4637 |
2/5/2015 | 4755 |
2/6/2015 | 4817 |
代码:
#-*- coding: utf-8 -*-
from __future__ import print_function
import sys
reload(sys)
sys.setdefaultencoding('utf-8') #UnicodeEncodeError: 'ascii' codec can't encode characters in position 2-3: ordinal not in range(128)
import pandas as pd
discfile = '../data/arima_data.xls'
forecastnum = 5
data = pd.read_excel(discfile, index_col=u'日期')
import matplotlib.pyplot as plt
plt.rcParams['font.sans-serif'] = ['SimHei']
plt.rcParams['axes.unicode_minus'] = False
data.plot()
plt.show()
from statsmodels.graphics.tsaplots import plot_acf
plot_acf(data).show() #calc acf
from statsmodels.tsa.stattools import adfuller as ADF
print(u'原始序列的ADF检验结果为: ', ADF(data[u'销量']))
#return value is: adf, pvalue, usedlag, nobs, critical values, icbest, regresult, resstore
D_data = data.diff().dropna()
D_data.columns = [u'销量差分']
D_data.plot()
plt.show()
plot_acf(D_data).show()
from statsmodels.graphics.tsaplots import plot_pacf
plot_pacf(D_data).show()
print(u'差分序列的ADF检验结果为: ', ADF(D_data[u'销量差分']))
from statsmodels.stats.diagnostic import acorr_ljungbox
print(u'差分序列的白噪声检验结果为: ', acorr_ljungbox(D_data, lags=1))
from statsmodels.tsa.arima_model import ARIMA
pmax = int(len(D_data)/10)
qmax = int(len(D_data)/10)
bic_matrix = []
for p in range(pmax+1):
tmp = []
for q in range(qmax+1):
try:
tmp.append(ARIMA(data, (p,1,q)).fit().bic)
except:
tmp.append(None)
bic_matrix.append(tmp)
bic_matrix = pd.DataFrame(bic_matrix)
p,q = bic_matrix.stack().idxmin()
print(u'BIC最小的p值和q值为: %s, %s' % (p,q))
model = ARIMA(data, (p,1,q)).fit()
model.summary2()
model.forecast(5)
图1:原始序列的时序图
图2:原始序列的自相关图
图3:一阶差分之后序列的时序图
图4:一阶差分之后序列的自相关图
图5:一阶差分后序列的偏自相关图