Mathematical Modeling: Time Series Analysis and ARIMA Model

Mathematical Modeling: Time Series Analysis and ARIMA Model

This blog introduces time series analysis and forecasting models commonly used in mathematical modeling. It introduces the most important ARIMA model in detail and shows its modeling process and reference code.

time series analysis

Time series analysis is a statistical method for processing time-related data, often used for forecasting, model building, and data understanding. The following are several common time series analysis algorithms and models:

  • Autoregressive Moving Average Model (ARMA): The ARMA model combines autoregressive (AR) and moving average (MA) models to capture linear relationships in data. The AR part uses observations from past time points to predict future values, while the MA part uses a linear combination of error terms to account for random fluctuations in the data.
  • Differential autoregressive moving average model (ARIMA): The ARIMA model is based on the ARMA model and adds the processing of non-stationary data. The ARIMA model consists of three parts: autoregressive (AR), difference (I) and moving average (MA). The difference operation is used to convert non-stationary time series data into stationary data, which makes it easier to apply the ARMA model for modeling and prediction. The general form of the ARIMA model is ARIMA(p, d, q), where p, d and q represent the order of the autoregressive term, difference operation and moving average term respectively. When the difference term d is 0, the ARIMA model degenerates into the ARMA model.
  • Seasonal Autoregressive Moving Average Model (SARMA): The SARMA model adds a component that considers seasonal factors based on the ARMA model. It is suitable for time series data with obvious seasonal changes. By introducing seasonal autoregressive terms and moving average terms, the SARMA model can better capture the seasonal pattern of the data.
  • Seasonal difference autoregressive moving average model (SARIMA): The SARIMA model is an extension of the SARMA model, which handles non-stationary time series data by introducing difference operations. Difference operations can eliminate the seasonality and trend of data, making the data more stable.
  • Random Walk: Random walk is a simple time series model. The basic assumption is that the future value is equal to the current value plus a random error term. This model assumes that future changes are random, with no clear trend or seasonality.
  • Seasonal Exponential Smoothing: This method is suitable for time series data with seasonal changes. It is based on an exponential smoothing model that predicts future observations by taking into account seasonality a weighted average.
  • Recurrent Neural Network (RNN): RNN is a deep learning model capable of processing sequence data. For time series analysis, recurrent neural networks (a type of RNN) can be used to capture the long-term dependencies of data and achieve more complex forecasting tasks.
  • Long Short-Term Memory Network (LSTM): LSTM is a special variant of RNN that is better able to handle long-term dependencies. In time series analysis, LSTM can capture the long-term dependencies of data and is suitable for complex time series forecasting and modeling tasks.

ARIMA model

introduce

ARIMA is the abbreviation of "Auto Regressive Integrated Moving Average". It is a model that makes current predictions based on historical values ​​of time series and prediction errors on historical values. ARIMA integrates the autoregressive term AR and the moving average term MA, and can model any non-seasonal time series with certain regularities. If the time series is seasonal, you need to use SARIMA (Seasonal ARIMA) modeling.

The ARIMA model has three hyperparameters: p, d, q.

  • p: The order of the AR (autoregressive) term. It needs to be set in advance, which means that the current value of y is related to the previous p historical values.
  • d: The minimum difference order that makes the sequence stationary, usually 1st order. A non-stationary sequence can be differentiated to obtain a stationary sequence, but excessive differentiation will cause the time series to lose autocorrelation, thereby losing the conditions for using the AR term.
  • q: The order of the MA (moving average) term. It needs to be set in advance, indicating that the current value of y is related to the AR prediction error of the previous q historical values. In fact, the AR term prediction error on historical values ​​is used to build a regression-like model.

Modeling steps

The schematic diagram of the modeling steps is shown below:
Insert image description here

  1. Stationarity test and determination of d order.
    Perform a stationarity test (ADF unit root test) on the original sequence. If the original sequence is stationary, no difference is needed, and d=0 at this time. Otherwise, the original sequence is non-stationary, and the original sequence is differentiated until the stationarity test is passed. In addition, you can draw the ACF diagram and observe the ACF diagram. If the tailing order is too large (for example, 10th order and above), you need to continue to find the difference. If the tailing order is too small (for example, 1st order), it means that the difference is excessive. The best The order of difference is to make the ACF tail by several orders and then truncate it.
  2. White noise test
    tests whether the original sequence or difference sequence is a non-white noise sequence. Only non-white noise sequences can be fitted and predicted using the ARMA(p,q) model.
  3. Determination of the order of p and q
    Generally, AICc (AICc is the modified AIC) and BIC criterion are used to determine the order of p and q. The smaller the information criterion, the better the selection of parameters. Through the exhaustive method, the ARIMA model with a given d value and different p and q values ​​is fitted, and its AICc and BIC information is obtained, and the corresponding parameters with the smallest information are selected as the optimal model.

Note: When determining the order of the difference d, do not use the information criterion to judge, because the difference will change the data used by the likelihood function, making the comparison of the information criterion meaningless, so other methods are usually used to select the appropriate d first. The information criterion can be used to determine the order of p and q. In addition, the advantage of the information criterion is that it can make a quantitative evaluation of the hyperparameters of the model before using the model to give predictions. This is especially useful in batch prediction scenarios, because batch predictions often require automatic ordering during program execution.

  1. After determining the values ​​of the three parameters d, p and q, the model can be established and the data can be fitted and predicted.

code example

Here is an example of modeling based on specific data at a time:

Note: The code is incomplete and is for reference only

# 导入必要的库
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import statsmodels.api as sm
from statsmodels.tsa.stattools import adfuller
from statsmodels.graphics.tsaplots import plot_acf
from statsmodels.stats.diagnostic import acorr_ljungbox
from statsmodels.tsa.arima.model import ARIMA

# 读取时间序列数据,在此需要修改你自己的数据
data = pd.read_csv('data.csv', index_col='date', parse_dates=True)

# 检查数据的平稳性(使用ADF单位根检验)
result = adfuller(data)
# 提取检验结果
adf_statistic = result[0]
p_value = result[1]
critical_values = result[4]
# 打印检验结果
print("ADF Statistic:", adf_statistic)
print("p-value:", p_value)
print("\nCritical Values:")
for key, value in critical_values.items():
    print(f"{
      
      key}: {
      
      value}")
    
# 绘制ACF图
plot_acf(data)
plt.show()

# 若数据不平稳,可以进行差分处理,直到数据变得平稳
# data_diff = data.diff().dropna()

# 白噪声检验(使用Ljung-Box检验)
lbvalue, pvalue = acorr_ljungbox(data_diff, lags=[10])
print("Ljung-Box检验统计量: ", lbvalue)
print("Ljung-Box检验p值: ", pvalue)

# 确定 ARMA(p, q) 的阶数
order_aic = arma_order_select_ic(data, ic='aic')['aic_min_order']
order_bic = arma_order_select_ic(data, ic='bic')['bic_min_order']
print("AIC 的最优阶数:", order_aic)
print("BIC 的最优阶数:", order_bic)

# 创建ARIMA模型
model = ARIMA(data, order=(p, d, q))
#model=sm.tsa.SARIMAX(seasona_data, order=(p,d,q)) #多元季节性时间序列模型
model_fit = model.fit()

# 查看模型拟合的结果
print(model_fit.summary())

# 预测未来的值
forecast = model_fit.forecast(steps=num_steps)

# 打印预测结果
print(forecast)

Guess you like

Origin blog.csdn.net/weixin_43603658/article/details/133089284