From the entry, ARIMA how to apply to time series analysis?

Original article by the author, reproduced without permission prohibited. Contact marcnuth (AT) foxmail.com understanding of such reproduction.

1 based on the concept

1.1 stationarity of the time series (weakly stationary)

1.1.1 definitions

Stationary time series analysis is a concept always throughout, and with unsmooth time series modeling, the phenomenon may occur spurious regression. How to determine whether a time series is stationary? Stationarity defined as follows:

  • Mean time series is a constant independent of t
  • The time series variance is constant regardless of t
  • Covariance time series is only relevant difference between the time

The following figure shows some non-stationary time series:

dmml_arima_unstationary_series.png

Further, to understand that the white noise is stationary time series.

1.1.2 stability test

DF test / unit root test:

In DF test, the time-series model assumed: $$ Y_t = rho Y_ {t-1} + mu_t $$ wherein (mu_t) as white noise, are differentiated, can be obtained: $$ Delta Y_t = (rho - 1) Y_ {t-1} + mu_t = delta Y_ {t-1} + mu_t, where Delta Y_t = Y_t - Y_ {t-1} $$

Thus it can be seen if (rho = 1 or delta = 0), then the sequence (Delta Y_t) is stationary, we call a flat-order process, denoted (I (1)). Accordingly, without first difference is the sequence denoted stationary (I (0)).

More generally, we can add a constant term in the above model, namely: $$ Delta Y_t = delta Y_ {t-1} + mu_t + beta_1 $$ sequence that is a more general model.

DF test is essentially based on a hypothesis testing model parameters ρ = 1 or of δ = 0.

ADF test:

Spreading sequence of DF test model, increasing a differential term further items on the trend, the sequence model are as follows: $$ Delta Y_t = delta Y_ {t-1} + mu_t + beta_1 + beta_2 t $$ wherein, (beta_2) is constants, t is time trend variable is also referred to. If the error term is the autocorrelation of the equivalent model: $$ Delta Y_t = delta Y_ {t-1} + alpha_i sum_ {i = 1} ^ m Delta Y_ {ti} + beta_1 + beta_2 t $$ is the ADF series model tests.

You can use adfuller statsmodels in python to the ADF test. 1 The following is a simple ADF test Python code:

import statsmodels.api as sm
import numpy as np

print('=> test random data:')
print(sm.tsa.stattools.adfuller(np.random.randn(100)))
print('=> test sin:')
print(sm.tsa.stattools.adfuller(np.sin(range(100))))
print('=> test line:')
print(sm.tsa.stattools.adfuller(range(100)))
=> test random data:
(-9.2888038134047193, 1.1951897142974731e-15, 0, 99, {'10%': -2.5825959973472097, '1%': -3.4981980821890981, '5%': -2.8912082118604681}, 267.32615073491127)
=> test sin:
(-20050428159241372.0, 0.0, 3, 96, {'10%': -2.5830997960069446, '1%': -3.5003788874873405, '5%': -2.8921519665075235}, -6145.6382792775457)
=> test line:
(1.8671227576372333, 0.99847325083384997, 9, 90, {'10%': -2.5842101234567902, '1%': -3.5051901961591221, '5%': -2.894232085048011}, -6469.5381959604356)

1.1.3 How to make a smooth time-series change

Typically, real-life time series are not stationary. However, many commonly used time series analysis algorithms are required to have a stationary time series, in order to apply these algorithms, we will not have to stationary time series becomes stable. Before thinking about how to solve this problem, we need to know, what makes it unstable time series? We know that in doing time series analysis, when any one of the time-series data will be split into three parts, they are:

  • White Noise
  • trend
  • Seasonal

Apparently white noise does not affect the smooth sequence. Thus, factors that affect the smooth sequence is the trend and seasonal .

How to remove trend sequence

Suppose a time series: $$ X_t = epsilon_t + trend_t $$

To such that the stationary time series, i.e. the (trend_t) can be subtracted from (X_t) in. So the question becomes how to look for trends in the time series? Usually has the following way:

  • MA: Moving Average. I.e. that (trend_t = frac {sum_ {i = tk} ^ t X_i} {k})
  • Polymerization: may be data in a period of time (month / year) polymerization, thereby obtaining trends
  • Polynomial fitting: Fit a polynomial equation that it is the trend of the time series

In these approaches, MA is the most common kind of way.

How to remove the sequence seasonal way to remove seasonal usually include:

  • Differential: Select a time difference k, for k do prosequence order difference
  • Decomposition: The seasonal trends and sequences are decomposed

Difference is the most common way. The content on decomposition can refer to the seasonal decomposition statsmodels in. 2

1.2 Random Walk

1.2.1 Ding Yi

Random walk model defines the following sequence: $$ X_t = X_ {t-1} + epsilon_t $$ wherein (epsilon_t) represents the error at time t. Can be seen from the above models, the random walk is a sequence value of the next moment only keep moment correlation values.

It is not that the equation is quite familiar? Yes, this is the equation in the previous section we talked about the first-order stationary process when given. So remember, the random walk sequence is characterized by:

  • Only to keep up with the current time value associated moment
  • Is a first order stationary process (I (1)), that is not stationary sequence

In order to consolidate again smooth concept, let's prove it is not stationary sequence of random walk.

1.2.2 unsteady proof

1. whether the average change over time?

The random walk model deployment sequence with: $$ X_t = X_ {t-1} + epsilon_t = X_0 + sum_ {i = 0} ^ t epsilon_i $$

Thus, mean: $$ E (X_t) = E (X_0) + sum_ {i = 0} ^ t E (epsilon_i) $$

Since (epsilon_i) is a random error, so the variance is 0, namely the random walk variance (E (X_0)), that is a constant. Therefore, the random walk sequence is the mean does not vary with time .

2. Does the variance change over time?

The random walk variance: $$ Var (X_t) = Var (X_0) + sum_ {i = 0} ^ t Var (epsilon_i) = 0 + t * delta $$

Since the (delta) is random noise variance is constant, the random walk sequence variance is time-varying .

3. Does covariance change over time?

By 2 above, we have learned that a random walk is not stationary series. So here it is left to you to prove it. remind:

$$ (X_t, X_ {tk}) = E (X_t - E (X_t)) E (X_ {tk} - E (X_ {tk})) $$

1.3 ACF

ACF (auto correlation function): namely autocorrelation function.

ACF is defined as: $$ ACF (k) = frac {E [(X_t - mu) (X_ {t + k} - mu)]} {sigma ^ 2} = frac {E [(X_t - mu) (X_ {t + k} - mu)]} {sqrt {E (X_t - mu) ^ 2} sqrt {E (X_ {t + k} - mu) ^ 2}} $$

ACF as a function of range [-1, 1], when the value of 0 indicates no correlation, 1 denotes a positive correlation, -1 indicates a negative correlation. Further, the ACF function is symmetrical about k = 0, i.e. ACF (k) = ACF (-k).

Sequence X is assumed stationary , we can define the sample autocorrelation function: $$ ACF (k) = frac {sum_ {t = k + 1} ^ n (X_t - bar {X}) (X_ {tk} - bar {X })} {sum_ {t = 1} ^ n (X_t - bar {X}) ^ 2} $$

According to the sample obtained from the function definition pattern correlation, i.e. correlation is conventional ACF FIG.

1.4 PACF

PACF (Partial autocorrelation function): i.e. partial autocorrelation function.

The reason for introducing ACF PACF because while K gives the order of correlation lags, but this does not eliminate the influence of the correlation as intermediate variables (X_ {t + 1} ... X_ {t + K-1}) is. 3

Thus, PACF definition can be seen: $$ PACF (k) = Corr (X_t - beta1 X_ {t-1} - ... - beta_ {k-1} Y_ {tk + 1}, X_ {tk } - beta_1 X_ {t-k + 1} - ... - beta_ {k-1} X_ {t-1}) $$

2 ARIMA model

2.1 Introduction

ARIMA (Auto-Regressive Integrated Moving Averages), known as self-Regressive Integrated Moving Average model consists of three parts:

  • AR model: required parameter p as lag count of item model. For example, if p = 3, then (X (t)) by (X (t-1), X (t-2), X (t-3)) predicted.
  • MA Model: required parameter q as the count of the error lags. For example, if q = 3, then (X (t)) by (e (t-1), e (t-2), e (t-3)) predicted, wherein (e (i)) is a moving average after the i-th value.
  • Order difference: the need for parameter i . The reason ARIMA model requires the difference is due to the AR model and MA model requires time series is stationary!

2.1.1 AR model

AR model is self regression model, the core idea of ​​the current value that is dependent on the value before. The sequence model is as follows: $$ AR (p): X_t = sum_ {i = 1} ^ p alpha_i * X_ {ti} + epsilon_t $$

This model is not that familiar with? It is when (alpha = 1) when, namely the random walk model. So here, it is worth mentioning that, AR model is not always smooth!

The following generates a (p = 1 and alpha = 0.5) of the AR (1) model:

dmml_arima_ar_model_1.png

2.1.2 MA model

MA model is a moving average model, which is similar to the AR model and the model, but MA emphasized that the current item may be based on past error term is obtained, namely: $$ MA (q): X_t = sum_ {i = 0} ^ q beta_i epsilon_ {ti} $$

It is worth noting that the different AR, MA model always smooth . 4

The following figure shows (beta = 0.5) of the MA model:

dmml_arima_ma_model_1.png

Comparison of two of AR and MA, you will find, in the MA model, the value drops rapidly, that is to say, Over time, affected by noise decreases rapidly.

2.1.3 Differential Item

As we mentioned earlier, before the AR model and the MA model applications, require time series is stationary. Therefore of course, after the combination AR model and MA: ARMA model also requires time series is stationary, in other words, the ARMA is stationary time series approach to modeling.

But in reality most of the time series is not stationary, but we also mentioned before, the most common way is not stationary time series variation is the difference stationary time series, ARMA model and a collection of items that is the difference we are talking about the ARIMA model. Therefore, ARIMA is no way for stationary time series modeling.

2.2 ARIMA model parameter adjustment Guide

ARIMA (p, d, q) of the parameter p, q and d. Wherein p is an AR model parameters, q is the parameters MA model, d is the order of difference.

2.2.1 D: differential selection order

Select the number of difference order can follow the rules about 5 :

  1. If the sequence is not ACF FIG attenuation approximation and 0, then the sequence need to do differential.
  2. If the sequence of the ACF (1) is zero or negative, or all are relatively small and no pattern, the sequence need not be a difference. If the ACF sequence (1) is smaller than -0.5, then the sequence difference may be over, the attention to this situation .
  3. Optimal order of difference is usually the smallest standard deviation of a sequence difference.
  4. when d = 0, i.e., the original series is stationary; when d = 1, i.e. the sequence is a uniform trend; when d = 2, i.e. trend sequence is time-varying.
  5. when d = 0, there is a general sequence such that a constant term average is not 0; d = 1, trend sequence contains a constant term; when d = 2, the sequence is not usually the model constant term.

5 in front of these rules can help you from the visual angle to determine whether a sequence needs to be excessively differential or a differential. In fact, we can detect the stationary sequence using stationary test have said before, in order to determine whether differential. Further, in some cases, the test may still not become stable after the sequence of a plurality of values of d. This time, you can try other approaches, such as the first time series then select the number of difference order after taking log.

2.2.2 Select the AR model parameters: p

For the AR (p) model, ideally, k> when p, PACF = 0, i.e. PACF p truncated after entry , it can be determined according to an initial cut-off p-value items PACF function.

2.2.3 Q: Select the model parameters MA

On MA (q) model, for which the sequence of FIG ACF, can be found, ideally when k> when q, ACF = 0, i.e., when q ACF cut items. Thus, it may be determined according to an initial value of the q term truncation function ACF.

2.3 Use of ARIMA model

2.3.1 Python

See the following Jupyter Notebook:

Jupyter Notebook: ARIMA in Python

Footnotes:

3

"Time Series Analysis: R Language" Chapter 6 6.2 partial autocorrelation function and the autocorrelation function expansion

Original: Large column  from entry talked about how to apply ARIMA time series analysis?


Guess you like

Origin www.cnblogs.com/petewell/p/11607173.html