[Data analysis] Predictive analysis using machine learning algorithms (4): Autoregressive Differential Moving Average Model (AutoARIMA) (2021-01-18)

Machine learning methods in time series forecasting (4): Autoregressive differential moving average model (Auto ARIMA)

This article is the fourth article in the series of " Machine Learning Methods in Time Series Forecasting ". If you are interested, you can read the previous article first:
[Data Analysis] Predictive analysis using machine learning algorithms (1): Moving Average (Moving Average) Average)
[Data analysis] Predictive analysis using machine learning algorithms (2): Linear Regression
[Data analysis] Predictive analysis using machine learning algorithms (3): K-Nearest Neighbours

Auto Regressive Integrated Moving Average Model (ARIMA, Auto Regressive Integrated Moving Average Model) is a very popular statistical method for time series forecasting. The ARIMA model takes the past values ​​into account to predict future values. There are three important parameters in ARIMA:

  • p (the past value used to predict the next value)
  • q (past forecast error used to predict future values)
  • d (differential order)

ARIMA parameter adjustment will take a lot of time. Therefore, we use Auto ARIMA to automatically select the best combination (p, q, d) with the smallest error.

The data set is the same as the previous three articles, and the purpose is to compare the prediction effects of different algorithms on the same data set. The data set and code are on my GitHub , and friends who need it can download it by themselves.

Import the package and read in the data.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.read_csv('NSE-TATAGLOBAL11.csv')

Use date as index.

# setting the index as date
df['Date'] = pd.to_datetime(df.Date,format='%Y-%m-%d')
df.index = df['Date']

The original data is divided into training set and prediction set.

data = df.sort_index(ascending=True, axis=0)

train = data[:987]
valid = data[987:]

training = train['Close']
validation = valid['Close']

Import the model and automatically find the best parameters p, q, d.

from pmdarima import auto_arima

model = auto_arima(training, start_p=1, start_q=1, max_p=3, max_q=3, m=12, start_P=0, seasonal=True, d=1, D=1, trace=True, error_action='ignore', suppress_warnings=True)
model.fit(training)

forecast = model.predict(n_periods=248)
forecast = pd.DataFrame(forecast, index = valid.index, columns=['Prediction'])

Insert picture description here
The size of RMSE reflects the size of the error to a certain extent.

rmse = np.sqrt(np.mean(np.power((np.array(valid['Close'])-np.array(forecast['Prediction'])),2)))
rmse

Visually observe the forecasting situation by drawing.

#plot
plt.figure(figsize=(16,8))
plt.plot(train['Close'])
plt.plot(valid['Close'])
plt.plot(forecast['Prediction'])
plt.show()

Insert picture description here
It can be seen that the prediction effect of the AutoARIMA algorithm on this data set is not very good. However, the model has captured the trend of the time series, and because it does not pay attention to the influence of seasonal changes, the forecast is not accurate enough.

Guess you like

Origin blog.csdn.net/be_racle/article/details/112780195