Three methods of time series prediction: statistical model, machine learning, and recurrent neural network

guide

Time series prediction is a class of classical problems, which has been extensively researched and applied in academia and industry. It is even said that everything in the world can be abstracted into time series problems after adding the time dimension, such as stock prices, weather changes, and so on. The relevant theories on time series prediction are also very extensive. In addition to various classic statistical models, the current popular machine learning and deep learning cyclic neural network can also be used to model time series prediction problems. Today, this article introduces the simple application of the three methods and verifies them on a real time series data set.

41ba44eb6818568c0aea8d12cdd412ad.png

Time series forecasting, its main task is to predict its value in the future based on the historical data of a certain indicator, for example, the curve in the above figure records the number of monthly flight passengers in 12 years and 144 months from 1949 to 1960 ( The specific unit has not been verified), then the problem to be solved in time series forecasting is: given the historical data of the first 9 years, such as 1949-1957, can the number of passengers in the two years from 1958-1960 be predicted.

In order to solve this problem, there are probably four mainstream solutions:

  • Statistical models, more classic AR series, including AR, MA, ARMA, and ARIMA, etc. In addition, the Prophet model launched by Facebook (accurately speaking, it should be called Meta now) is actually a statistical model in essence, but On the basis of the traditional trend and periodic components, the influence of factors such as holidays and timing inflection points are further considered in detail, in order to bring more accurate timing rules;

  • Machine learning model, in supervised machine learning, the regression problem mainly solves the problem of predicting the possible value of a Label based on a series of features, so when the historical data is used as the feature, it is natural to use the time series prediction problem Abstracted as a regression problem, from this perspective, all regression models can be used to solve time series forecasting. For abstract time series forecasting with machine learning, it is recommended to view this paper "Machine Learning Strategies for Time Series Forecasting";

  • Deep learning model, the mainstream application scenarios of deep learning are CV and NLP, the latter of which is specially used to solve the problem of sequence problem modeling, and time series is of course a special form of sequence data, so it is natural to Using recurrent neural networks to model time series forecasting;

  • Hidden Markov model, Markov model is a classic abstraction used to describe the transition between adjacent states, and hidden Markov model further adds hidden states to enrich the expressive ability of the model. However, one of the major assumptions is that the future state is only related to the current state, and it is not conducive to using multiple historical states to participate in the prediction. The more commonly used one may be the example of weather forecasting.

This paper mainly considers the first three time series forecasting modeling methods, and selects: 1) Prophet model, 2) RandomForest regression model, 3) LSTM three schemes to test.

First, test on the real data set of flight passengers, and compare the prediction accuracy of the three selected models in turn. The data set has a total of 12 years of passenger numbers per month. January 1958 is used as the segmentation interface to divide the training set and test set, that is, the data of the first 9 years is used as the training set, and the data of the last 3 years is used as the test set to verify the effect of the model. . The schematic diagram of the data set after segmentation is as follows:

df = pd.read_csv("AirPassengers.csv", parse_dates=["date"]).rename(columns={"date":"ds", "value":"y"})
X_train = df[df.ds<"19580101"]
X_test = df[df.ds>="19580101"]


plt.plot(X_train['ds'], X_train['y'])
plt.plot(X_test['ds'], X_test['y'])

814250cf0fb15a619f6c0dfc4acf8449.png

1. Prophet model prediction . Prophet is a highly encapsulated time series prediction model that accepts a DataFrame as a training set (ds and y field columns are required), and also accepts a DataFrame when predicting, but only needs to have a ds column at this time. About the model For a detailed introduction, please refer to its official documentation: https://facebook.github.io/prophet/. The core code of model training and prediction is as follows:

from prophet import Prophet
pro = Prophet()
pro.fit(X_train)
pred = pro.predict(X_test)


pro.plot(pred)

The schematic diagram of the training result is as follows:

9ef951b63e2a3c9f30193a3d04731410.png

Of course, this is the result given by Prophet's built-in visualization function, or by manually drawing the comparison between the real label of the test set and the predicted result:

4447ef206f0387df9ca2e7b49453c3ed.png

It is easy to see that although the overall trend of the series has a good fitting result, there is actually a relatively large gap in the specific values.

2. Machine learning model , the RandomForest model that is often used as various baselines is selected here. When using machine learning to realize time series prediction, it is usually necessary to extract features and labels through sliding windows, and then actually need to slide and intercept test set features to achieve single-step prediction when realizing prediction. Refer to the paper "Machine Learning Strategies for Time In the practice of Series Forecasting, the problem can be roughly described as follows:

6b3cd2f6df33efb311d2fe129c96875c.png

Accordingly, the feature extraction window length is set to 12, and the way to construct the training set and test set is as follows:

data = df.copy()
n = 12
for i in range(1, n+1):
    data['ypre_'+str(i)] = data['y'].shift(i)
data = data[['ds']+['ypre_'+str(i) for i in range(n, 0, -1)]+['y']]


# 提取训练集和测试集
X_train = data[data['ds']<"19580101"].dropna()[['ypre_'+str(i) for i in range(n, 0, -1)]]
y_train = data[data['ds']<"19580101"].dropna()[['y']]
X_test = data[data['ds']>="19580101"].dropna()[['ypre_'+str(i) for i in range(n, 0, -1)]]
y_test = data[data['ds']>="19580101"].dropna()[['y']]


# 模型训练和预测
rf = RandomForestRegressor(n_estimators=10, max_depth=5)
rf.fit(X_train, y_train)
y_pred = rf.predict(X_test)


# 结果对比绘图
y_test.assign(yhat=y_pred).plot()

9b87062d7a7b35e74cb5cf4d8b77474d.png

It can be seen that the prediction effect is relatively general, especially for the prediction results of the last two years, the gap with the real value is still relatively large. It is easy to explain this phenomenon with the thinking of a machine learning model: the random forest model is actually learning the law between the curves according to the training data set. Since the time series as a whole shows a trend of increasing over time, the highest point in the historical data It is not enough to cover the larger values ​​in the future, so all the labels that exceed the historical data in the test set are actually unable to fit.

3. The cyclic neural network in deep learning , in fact, deep learning generally requires a large data set to give full play to its advantages, and the data set here is obviously very small, so only one of the simplest models is designed: 1 layer LSTM+1 Layer Linear. The model is built as follows:

class Model(nn.Module):
    def __init__(self):
        super().__init__()
        self.rnn = nn.LSTM(input_size=1, hidden_size=10, batch_first=True)
        self.linear = nn.Linear(10, 1)
        
    def forward(self, x):
        x, _ = self.rnn(x)
        x = x[:, -1, :]
        x = self.linear(x)
        return x

The overall idea of ​​data set construction is the same as the aforementioned machine learning part, and then, according to the model training alchemy, some results are as follows:

# 数据集转化为3D
X_train_3d = torch.Tensor(X_train.values).reshape(*X_train.shape, 1)
y_train_2d = torch.Tensor(y_train.values).reshape(*y_train.shape, 1)
X_test_3d = torch.Tensor(X_test.values).reshape(*X_test.shape, 1)
y_test_2d = torch.Tensor(y_test.values).reshape(*y_test.shape, 1)


# 模型、优化器、评估准则
model = Model()
creterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters())


# 训练过程
for i in range(1000):
    out = model(X_train_3d)
    loss = creterion(out, y_train_2d)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    
    if (i+1)%100 == 0:
        y_pred = model(X_test_3d)
        loss_test = creterion(y_pred, y_test_2d)
        print(i, loss.item(), loss_test.item())


# 训练结果
99 65492.08984375 188633.796875
199 64814.4375 187436.4375
299 64462.09765625 186815.5
399 64142.70703125 186251.125
499 63835.5 185707.46875
599 63535.15234375 185175.1875
699 63239.39453125 184650.46875
799 62947.08203125 184131.21875
899 62657.484375 183616.203125
999 62370.171875 183104.671875

Through the above 1000 epochs, it can be roughly inferred that the model will not fit well, so give up decisively!

Of course, it must be pointed out that the above test results can only illustrate the performance of the three schemes on this data set, and cannot represent the performance of this type of model when used in time series prediction problems. In fact, the time series prediction problem itself is a scenario that requires specific analysis of specific problems, and there is no good model that is universally applicable, just like "No Free Lunch"!

This article is only a small test of the time series forecasting series of tweets, and other related experiences and summaries will be updated from time to time in the future.

458f1d9784c2b30bb45d4832a81440f6.png

Related Reading:

Guess you like

Origin blog.csdn.net/weixin_43841688/article/details/122053781