Time series analysis model ARIMA in Python

Hi everyone, Time series analysis is widely used to predict and forecast future data points in time series, ARIMA model is widely used for time series forecasting and is considered one of the most popular methods. This article will introduce how to build and evaluate an ARIMA model for time series forecasting in Python.

What is ARIMA model

The ARIMA model is a statistical model used to analyze and predict time series data. The ARIMA method explicitly adapts to common structures in time series, providing a simple yet powerful method for accurate time series forecasting.

ARIMA is the abbreviation of AutoRegressive Integrated Moving Average. It combines three key aspects:

  • Autoregressive (AR): A model built using the correlation between current observations and lagged observations. The number of lagged observations is called the lag order, or p.

  • Integration (I): Stationary the time series by differencing the original observations. The number of difference operations is called d.

  • Moving Average (MA): A model that takes into account the relationship between the current observation and the residual error of a moving average model applied to past observations. The size of the moving average window is order or q.

An ARIMA model is represented as ARIMA(p,d,q), where p, d, and q are replaced with integer values ​​to specify the exact model used.

Key assumptions when using the ARIMA model:

  • The time series is generated by the underlying ARIMA process.

  • The parameters p, d and q must be appropriately specified based on the original observations.

  • Before fitting an ARIMA model, the time series data must be made stationary by differencing.

  • If the model fits well, the residuals should be uncorrelated and normally distributed.

All in all, ARIMA models provide a structured and configurable method for modeling time series data for purposes such as forecasting. Next, we will introduce how to fit ARIMA models in Python.

Python code example

In this article, we will use the Netflix securities data available on Kaggle to predict Netflix security prices using the ARIMA model.

  • Data loading

This example loads a security price data set with the "Date" column as the index.

import pandas as pd


net_df = pd.read_csv("Netflix_stock_history.csv", index_col="Date", parse_dates=True)
net_df.head(3)

picture

  • data visualization

You can use pandas plotfunctions to visualize changes in security prices and volumes over time. It is obvious that security prices are growing exponentially.

net_df[["Close","Volume"]].plot(subplots=True, layout=(2,1));

picture

  • Rolling Forecast ARIMA Model

The data set for this example has been split into a training set and a test set, and the ARIMA model has been trained, and the first prediction has been made.

I got a bad result using the generic ARIMA model, which produced a flat line. Therefore, for this example we decided to try the rolling prediction method.

NOTE: The code example is a modified version of BOGDAN IVANYUK's notebook.

from statsmodels.tsa.arima.model import ARIMA
from sklearn.metrics import mean_squared_error, mean_absolute_error
import math


train_data, test_data = net_df[0:int(len(net_df)*0.9)], net_df[int(len(net_df)*0.9):]


train_arima = train_data['Open']
test_arima = test_data['Open']


history = [x for x in train_arima]
y = test_arima
# 进行第一次预测
predictions = list()
model = ARIMA(history, order=(1,1,0))
model_fit = model.fit()
yhat = model_fit.forecast()[0]
predictions.append(yhat)
history.append(y[0])

When working with time series data, it is often necessary to make rolling forecasts due to dependence on previous observations. One approach is to recreate the model after each new observation is received.

In order to keep track of all observations, we can manually maintain a historylist called , which initially contains the training data and append new observations to it at each iteration. This approach can help us obtain an accurate predictive model.

# 滚动预测
for i in range(1, len(y)):
    # 预测
    model = ARIMA(history, order=(1,1,0))
    model_fit = model.fit()
    yhat = model_fit.forecast()[0]
    # 反转转换预测值
    predictions.append(yhat)
    # 观察结果
    obs = y[i]
    history.append(obs)
  • Model evaluation

This example's rolling forecast ARIMA model shows 100% improvement over a simple implementation, producing impressive results.

# 报告性能
mse = mean_squared_error(y, predictions)
print('MSE: '+str(mse))
mae = mean_absolute_error(y, predictions)
print('MAE: '+str(mae))
rmse = math.sqrt(mean_squared_error(y, predictions))
print('RMSE: '+str(rmse))
MSE: 116.89611817706545
MAE: 7.690948135967959
RMSE: 10.811850821069696

Next, visualizing and comparing the actual results with the predicted results, it is clear that the model in this example makes highly accurate predictions.

import matplotlib.pyplot as plt
plt.figure(figsize=(16,8))
plt.plot(net_df.index[-600:], net_df['Open'].tail(600), color='green', label = 'Train Stock Price')
plt.plot(test_data.index, y, color = 'red', label = 'Real Stock Price')
plt.plot(test_data.index, predictions, color = 'blue', label = 'Predicted Stock Price')
plt.title('Netflix Stock Price Prediction')
plt.xlabel('Time')
plt.ylabel('Netflix Stock Price')
plt.legend()
plt.grid(True)
plt.savefig('arima_model.pdf')
plt.show()

picture

To summarize, in this article we have provided an overview of the ARIMA model and how to implement time series forecasting in Python. The ARIMA method provides a flexible and structured way to model time series data, relying on previous observations and past forecast errors.

Guess you like

Origin blog.csdn.net/csdn1561168266/article/details/132438817