Basic operations of time series: using python and eviews to determine the order and forecast of AR and ARMA models

Basic process for general processing of time series: (without seasonal trend)

Simple procedure for processing time series (no seasonal trends)

Note: The statistic correction of the LB test in the above figure is: n*(n+2), not n*(n-2)

 Several basic time series models (see the end of the article for the specific form of the model):

Table of contents

1. Python processing

1.1.step1: Stationarity test and white noise test

1.1.1 Stationarity test: ADF test

1.1.2 Difference correction:

1.1.3 White noise test: L-B statistic/Q statistic

1.2.step2: Model identification and order determination

1.2.1 Method 1: Observe the tailing and truncation of ACF and PACF

1.2.2 Method 2: AIC and BIC information standards

1.3.step3: Model construction and forecasting

1.3.4step4: Model testing

2. Eviews processing

2.1.step1: Stationarity test and white noise test

2.2.step2: Model identification and order determination

2.3.step3: Model construction and forecasting

2.3.1 Model construction:

2.3.2 Model forecast:

2.4.step4: Model testing

2.5 Testing and establishment of ARCH model

ARCH effect test: LM test

Build an ARCH model for the residuals

3. Python practice process 


Take the data from question 17 on pages p94-p96 of "Applied Time Series Analysis" by Wang Yan, fourth edition, as an example.

Basic introduction:

1. Python processing

Key reference articles:

Time series model (ARIMA and ARMA) complete steps detailed_Foneone's blog-CSDN blog_arma_order_select_ic

1.1.step1: Stationarity test and white noise test

First, you can draw a line chart to directly observe the data trend to roughly judge the stability. There is neither trend nor cycle;

df.plot(color='blue',title='data-17') #绘制时间序列的线图

1.1.1 Stationarity test: ADF test

· Test hypothesis:H0: There is a unit root vs H1: There is no unit root

If the series is stationary, there should be no unit root, so we hope to reject the null hypothesis

· python code: adfuller

from statsmodels.tsa.stattools import adfuller
adftest = adfuller(x, autolag='AIC')  #ADF检验

The test results are as follows:

 adftest[0]: t value of ADF test; adftest[1]: p value of t value of ADF test; adftest[4:6]: ADF test corresponding to three confidence levels (1%, 5%, 10%) t value, you can directly compare adftest[0] with these three values;

Judging from this output result, -5.7185 is significantly smaller than the next three t values, and the p value is close to 0. Therefore, the null hypothesis is rejected at the 1% significance level, and the series is considered stationary;

1.1.2 Difference correction:

If it is not stationary, you can consider using differential operations to correct it to a stationary sequence. If the difference is stationary, an ARIMA model is established for the original sequence.

· python code: timeseries.diff()

#### 差分运算
def diff(timeseries):
    d1_sale=timeseries.diff(periods=1).dropna()#dropna删除NaN
    d1_sale.plot(color='orange',title='diff1')
    return d1_sale

1.1.3 White noise test: L-B statistic/Q statistic

· Testing hypotheses and statistics:

("Residual" in a. above should be replaced by "sequence") Usually m is taken as [n/10] or [root n]. If the observed amount is small, it can also be taken as [n/4]; if the null hypothesis is rejected It is considered not a white noise test

· python code: acorr_ljungbox

from statsmodels.stats.diagnostic import acorr_ljungbox #白噪声检验
test_value = acorr_ljungbox(timeseries, lags=1)

The test results are as follows:

 test_value[1] is the p value, so it can be considered that the sequence is not a white noise sequence at the 5% significance level

1.2.step2: Model identification and order determination

1.2.1 Method 1: Observe the tailing and truncation of ACF and PACF

· python code:

plot_acf(timeseries,lags) #lags: delay order

plot_pacf(timeseries,lags)

import statsmodels.api as sm
from statsmodels.graphics.tsaplots import plot_acf,plot_pacf
def determinate_order_acf(timeseries):  
    plot_acf(timeseries,lags=30) #自己定延迟数
    plot_pacf(timeseries,lags=30)
    plt.show()

The output is as follows:

1.2.2 Method 2: AIC and BIC information standards

Iterate the estimates under different values ​​of p and q, and select the parameters p and q with the smallest AIC and BIC;

Note: AIC may overestimate the order, and BIC may underestimate the order.

· python code: sm.tsa.arma_order_select_ic

#AIC
AIC_summary=sm.tsa.arma_order_select_ic(timeseries,max_ar=4,max_ma=0,ic='aic')
#BIC
BIC_summary=sm.tsa.arma_order_select_ic(timeseries,max_ar=4,max_ma=0,ic='bic')

Output result:

If you want to directly get the minimum order of AIC and BIC, add the index directly: ['aic_min_order']

#AIC
AIC=sm.tsa.arma_order_select_ic(timeseries,max_ar=ar_max,max_ma=ma_max,
                                    ic='aic')['aic_min_order']
#BIC
BIC=sm.tsa.arma_order_select_ic(timeseries,max_ar=ar_max,max_ma=ma_max,
                                    ic='bic')['bic_min_order']
print('the AIC is{},\nthe BIC is{}\n'.format(AIC,BIC))

1.3.step3: Model construction and forecasting

· Model construction:arma_model = ARMA(train_data,order).fit()

· Model forecast:arma_model.forecast(pred_end)

#### 模型构建:利用机器学习的知识(?
def ARMA_model(train_data,order,pred_end): # train_data:(训练数据,测试数据)拟合数据,order:所定阶数
    arma_model = ARMA(train_data,order).fit()#ARMA模型训练器
    #拟合结果
    print(arma_model.summary())
    #给出一个残差序列的方差 尝试作为sigma的估计值(不确定对不对
    print('随机扰动项的标准差sigma估计:',np.std(arma_model.resid))
    #out_sample_pred = arma_model.predict(start=len(train_data),end = len(train_data)+pred_end-1,dynamic=True)
    #print(out_sample_pred)
    print('预测未来{}天,其预测结果\n{}\n标准误差\n{}\n置信区间如下:\n{}'.format(pred_end,
                                                            arma_model.forecast(pred_end)[0],
                                                            arma_model.forecast(pred_end)[1],
                                                            arma_model.forecast(pred_end)[2]))

Disadvantages: The results of the ARMA model trainer do not seem to contain random disturbance terms that obey the estimate of the variance (the output residual sequence arma_model.resid tries to calculate its variance instead) using the ARIMA model seems to be able to get an estimate of sigma (? Not sure about the operation of ARIMA Haven’t learned www yet)

Output result:

1.3.4step4: Model testing

Do another white noise test on the residual sequence; the rest also include QQ plot test for normality (to be added)

2. Eviews processing

2.1.step1: Stationarity test and white noise test

2.1.1 Stationarity test: ADF test: UnitRoot Test

2.1.2 White noise test: Correlogram directly observes the Q test statistic and p value of ACF. The output result of this question is as shown below. The null hypothesis can be rejected, that is, it is a non-random sequence. If the p value is large, accept the null hypothesis and consider it to be a white noise sequence (but at this time, you should pay attention to the sequence line graph to see if there is any heteroskedasticity problem. If there is heteroscedasticity, it cannot be considered to be white noise)

2.2.step2: Model identification and order determination

Method 1: Observe the tailing and truncation of ACF and PACF: Correlogram can be directly observed.

Method 2: AIC and BIC criteria: There is no automatic iterative method to detect the minimum AIC and BIC in eviews. You can only select several alternative models for fitting respectively. The obtained results contain the value of AIC and you can manually compare them yourself;

2.3.step3: Model construction and forecasting

2.3.1 Model construction:

Add some modeling instructions:

MA(1) model: ls x c ma(1)

Generate first differences: genr->d(x) or x-x(-1)

 Regarding the estimated value of the random disturbance term subject to the standard deviation: using the statistical description of the resid sequence, select Std.Dev. as the estimated value;

2.3.2 Model forecast:

 First expand the length of the sequence, then click forecast on the fitted model, select the prediction range, and the new sequence obtained will contain the prediction results;

Note: eviews will not provide the specific value of the forecast error, but will only provide a picture of the confidence interval;

2.4.step4: Model testing

White noise test of residuals: Select r (residual sequence) in the data frame to continue: Correlogram Observe the Q test statistic and p value of ACF.

2.5 Testing and establishment of ARCH model

(A new set of data is used here)

The form of ARCH model:

step1: Establish a suitable main regression model according to the previous method

step2: ARCH effect test: LM test

Perform LM test of ARCH effect on the residuals of the main model

View→Residual Diagnostics→Heteroskedasticity Tests→ARCH and then set the delay order yourself at lags

 LM test: H0 does not have the heteroskedasticity problem of ARCH effect

Observe the p value of the test: If the p value is very small, the null hypothesis is rejected and it is considered that the residuals have heteroskedasticity problems and there is an ARCH effect.

 step3: Establish an ARCH model for the residuals

3. Python practice process 

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt #画图
import statsmodels.api as sm

from statsmodels.graphics.tsaplots import plot_acf,plot_pacf #ACF与PACF
from statsmodels.tsa.stattools import adfuller #ADF检验
from statsmodels.stats.diagnostic import acorr_ljungbox #白噪声检验
from statsmodels.tsa.arima_model import ARMA #ARMA模型
from statsmodels.tsa.arima_model import ARIMA #ARIMA模型

step1: Stationarity test and white noise test

#### 平稳性检验
def ADF_test(timeseries):
    x = np.array(timeseries) #转为一维数组
    adftest = adfuller(x, autolag='AIC')  #ADF检验
    print (adftest) 
    if adftest[0] < adftest[4]["1%"] and adftest[1] < 10**(-6): 
    # 对比Adf结果和10%的时的假设检验 以及 P-value是否非常接近0(越小越好)
        print("序列平稳")
        return True 
    else:
        print("非平稳序列")
        return False
    
####  随机性检验(白噪声检验)
def random_test(timeseries) : 
    p_value = acorr_ljungbox(timeseries, lags=1)  # p_value 返回二维数组,第二维为P值
    #print(p_value)
    if p_value[1] < 0.05: 
        print("非随机性序列")
        return  True
    else:
        print("随机性序列,即白噪声序列")
        return False
"""
————————————————
版权声明:本文为CSDN博主「Foneone」的原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接及本声明。
原文链接:https://blog.csdn.net/foneone/article/details/90141213
"""

Note: When I actually use the above ADF test, the statistical results output by the ADF test of eviews are very different. I don’t know where the problem lies for the time being.

 Differential operation:

#### 差分运算
def diff(timeseries):
    d1_sale=timeseries.diff(periods=1).dropna()#dropna删除NaN
    d1_sale.plot(color='orange',title='diff1')
    return d1_sale

 Step2: Model selection and order determination

Model Selection: Plotting ACF vs. PACF

#### 绘制ACF与PACF的图像
def plot_acf_pacf(timeseries): #利用ACF和PACF判断模型阶数 
    plot_acf(timeseries,lags=30) #延迟数
    plot_pacf(timeseries,lags=30)
    plt.show()

 AIC and BIC order determination: two methods

#Method 1: Use the built-in function: The ARIMA model needs to be brought in with the differential sequence.

#Method 2: Iterative tuning

Method 2 reference article:Time series analysis ARIMA and its Python implementation_Jupiter Talkative Expert's Blog-CSDN Blog_arima python implementation

#### AIC与BIC定阶
#法一:利用自带函数
def detetminante_order(timeseries,ar_max,ma_max): #信息准则定阶:AIC、BIC
    #AIC
    AIC_summary=sm.tsa.arma_order_select_ic(timeseries,max_ar=ar_max,max_ma=ma_max,ic='aic')
    #BIC
    BIC_summary=sm.tsa.arma_order_select_ic(timeseries,max_ar=ar_max,max_ma=ma_max,ic='bic')
    print('the AIC is{},\nthe BIC is{}\n'.format(AIC_summary.['aic_min_order'],BIC.['bic_min_order']))
    print(AIC_summary,'\n',BIC_summary)
    
#法二:迭代调优
def determine_aic_arima(train_data,pmax,I,qmax):
    aic_matrix=[]
    for p in range(pmax+1):
        tmp=[]
        for q in range(qmax+1):
            try:
                tmp.append(ARIMA(train_data,order=(p,I,q)).fit().aic)
            except:
                tmp.append(None)
        aic_matrix.append(tmp)
    aic_matrix=pd.DataFrame(aic_matrix)    
    p,q=aic_matrix.stack().idxmin() #最小值的索引
    print('用AIC方法得到最优的p值是%d,q值是%d'%(p,q))
    
def determine_bic_arima(train_data,pmax,I,qmax):
    bic_matrix=[]
    for p in range(pmax+1):
        tmp=[]
        for q in range(qmax+1):
            try:
                tmp.append(ARIMA(train_data,order=(p,I,q)).fit().bic)
            except:
                tmp.append(None)
        bic_matrix.append(tmp)
    bic_matrix=pd.DataFrame(bic_matrix)    
    p,q=bic_matrix.stack().idxmin() #最小值的索引
    print('用AIC方法得到最优的p值是%d,q值是%d'%(p,q))

 step3: Model construction and forecasting

If you need to fit ARIMA, you only need to replace it with ARIMA(train_data,order).fit()

#### 模型构建:利用机器学习的知识(?
def ARMA_model(train_data,order,pred_end): # train_data:(训练数据,测试数据)拟合数据,order:所定阶数
    arma_model = ARMA(train_data,order).fit()#ARMA模型训练器
    print(arma_model.summary()) #拟合结果
    print('随机扰动项的标准差sigma估计:',np.std(arma_model.resid))#给出一个残差序列的方差 尝试作为sigma的估计值(不确定对不对
    print('预测未来{}天,其预测结果\n{}\n标准误差\n{}\n置信区间如下:\n{}'.format(pred_end,
                                                            arma_model.forecast(pred_end)[0],
                                                            arma_model.forecast(pred_end)[1],
                                                            arma_model.forecast(pred_end)[2]))

Formal process:

df1=pd.read_excel(r'F:\个人嘿嘿嘿\北师大BNU\研一上-课业资料\时间序列\作业2\timeseries17.xlsx')
df17=df1['cq']

df17.plot(color='blue',title='data-17') #绘制时间序列的线图
plot_acf_pacf(df17)             #绘制时间序列的ACF与PACF
ADF_test(df17)      #平稳性检验
random_test(df17)   #白噪音检验
detetminante_order(df17,4,0)        #AIC,BIC定阶
ARMA_model(df17,(1,0),5)                #模型拟合和预报

4. Forms of common time series models

(1) AR model (autoregressive model)

The relationship between the data at the current point in time and its own data in the past.

(2) MA model (moving average model)

The relationship between the data at the current time point and the noise in the past.

(3) ARMA model (autoregressive moving average model)

(4) ARIMA model: for trending time series

(5) Product ARMA model: time series with seasonal cycles

(6) Product ARIMA model: time series with trend + cycle

(7) ARCH model + GARCH model

ARCH model: autoregressive conditional heteroskedasticity model

Features: When the residuals of the autoregressive model are heteroscedastic, the residual form is transformed to make it equal to some transformation of the past residuals, thus satisfying the homoskedasticity assumption;

GARCH model: Autoregressive conditional heteroskedasticity model introducing lag operator

Guess you like

Origin blog.csdn.net/qq_59613072/article/details/127867440