数据挖掘--挖掘建模-时序模式-ARIMA模型

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/Ericsson_Liu/article/details/82528886

代码与数据来源:Python数据分析与挖掘实战

原始数据:

日期 销量
1/1/2015 3023
1/2/2015 3039
1/3/2015 3056
1/4/2015 3138
1/5/2015 3188
1/6/2015 3224
1/7/2015 3226
1/8/2015 3029
1/9/2015 2859
1/10/2015 2870
1/11/2015 2910
1/12/2015 3012
1/13/2015 3142
1/14/2015 3252
1/15/2015 3342
1/16/2015 3365
1/17/2015 3339
1/18/2015 3345
1/19/2015 3421
1/20/2015 3443
1/21/2015 3428
1/22/2015 3554
1/23/2015 3615
1/24/2015 3646
1/25/2015 3614
1/26/2015 3574
1/27/2015 3635
1/28/2015 3738
1/29/2015 3707
1/30/2015 3827
1/31/2015 4039
2/1/2015 4210
2/2/2015 4493
2/3/2015 4560
2/4/2015 4637
2/5/2015 4755
2/6/2015 4817

代码:

#-*- coding: utf-8 -*-

from __future__ import print_function

import sys
reload(sys)
sys.setdefaultencoding('utf-8')  #UnicodeEncodeError: 'ascii' codec can't encode characters in position 2-3: ordinal not in range(128)

import pandas as pd

discfile = '../data/arima_data.xls'
forecastnum = 5

data = pd.read_excel(discfile, index_col=u'日期')

import matplotlib.pyplot as plt

plt.rcParams['font.sans-serif'] = ['SimHei']
plt.rcParams['axes.unicode_minus'] = False
data.plot()
plt.show()


from statsmodels.graphics.tsaplots import plot_acf
plot_acf(data).show()  #calc acf

from statsmodels.tsa.stattools import adfuller as ADF
print(u'原始序列的ADF检验结果为: ', ADF(data[u'销量']))
#return value is: adf, pvalue, usedlag, nobs, critical values, icbest, regresult, resstore

D_data = data.diff().dropna()
D_data.columns = [u'销量差分']
D_data.plot()
plt.show()
plot_acf(D_data).show()

from statsmodels.graphics.tsaplots import plot_pacf
plot_pacf(D_data).show()
print(u'差分序列的ADF检验结果为: ', ADF(D_data[u'销量差分']))


from statsmodels.stats.diagnostic import acorr_ljungbox
print(u'差分序列的白噪声检验结果为: ', acorr_ljungbox(D_data, lags=1))

from statsmodels.tsa.arima_model import ARIMA

pmax = int(len(D_data)/10)
qmax = int(len(D_data)/10)

bic_matrix = []

for p in range(pmax+1):
    tmp = []
    for q in range(qmax+1):
        try:
            tmp.append(ARIMA(data, (p,1,q)).fit().bic)
        except:
            tmp.append(None)

    bic_matrix.append(tmp)

bic_matrix = pd.DataFrame(bic_matrix)
p,q = bic_matrix.stack().idxmin()
print(u'BIC最小的p值和q值为: %s, %s' % (p,q))

model = ARIMA(data, (p,1,q)).fit()
model.summary2()
model.forecast(5)

 图1:原始序列的时序图

图2:原始序列的自相关图

图3:一阶差分之后序列的时序图

图4:一阶差分之后序列的自相关图

图5:一阶差分后序列的偏自相关图

猜你喜欢

转载自blog.csdn.net/Ericsson_Liu/article/details/82528886