Today, I used python to make an arima model, and there are no other problems, but I encountered a big problem in the prediction. After checking for a long time, I checked both Chinese and English, and checked the document. It seems that there is no way to solve it. It may be Python's statsmodels.tsa.arima_model did not consider this situation during the design process. This article first talks about some basic situations and precautions of using python to do arima model, and then talks about the existing problems.

1. statsmodels.tsa.arima_model considerations

There have been many introductions about this arima package on the Internet. Here are a few portals: python arima model1 , python arima model2 , official documents , a good English tutorial, you can basically understand how to use it. I will mainly talk about some points that need to be paid attention to during the operation.

The data can use series. First, the data is differentiated to make it stable, and the order is obtained, which is d, and then autocorrelation graph and partial autocorrelation graph are made, p and q are determined in turn, and the basic form of the model arima(p, d, q).
Just use the fit function when training the model. Note that there is a parameter of the fit function , method , which contains three methods for training the model, namely {'css-mle','mle','css'}, the default is mle, if If you report an error in training, you can try css. Generally, css is not easy to fail in training. It is generally suggested that problems such as inability to converge or inverse inversion are caused by inappropriate training methods. Try another one.
There are two ways to predict: predict and forecast. Generally, predict can be completed, but forecast is not understood, it is not very clear. ARIMA.predict(params, start=None, end=None, exog=None, typ='linear' , dynamic=False), you can set a start and end, and then return an array, which is the prediction result from start to end. This prediction can be in-sample or out-of-sample. start and end can be indexes or date strings.
One of the parameters in predict is typ, and the default is'linear', so by default, the value after the difference will be output instead of the value of the original level. You can set this parameter to'levels' to output the value of the original level.

2. Existing problems

In order to better understand this problem, let me first talk about a demand that exists in reality. I trained an arima model and used it to predict stock prices, but I don’t want to retrain this model every day. The model from last week still hopes to be used directly this week, but I got the new data this week, so I put these The newly acquired data is also used as input to predict tomorrow's data. The predict function cannot achieve this function, because when predict performs multi-step prediction, it actually performs a single-step prediction first, and then uses this predicted value as input to do a single-step prediction again, and get the two-step prediction value. The function that needs to be implemented in the said example is: use the sequence t0-t5 to train the model, then predict t6, and then give the real t6 to predict t7, instead of predicting t7 with the predicted t6. Then the lack of this feature will cause a lot of trouble. For example, I want to use the arima model and a machine learning model to compare performance, such as svm. We divide the data set into a training set and a test set, and then use the training set to train the model. It is determined, and then tested on the test set, that is, input the test set, and get the predicted value. For example, MSE can be calculated, but statsmodels.tsa.arima_model cannot do this, because the test set cannot be input to make predictions. . I have not solved this problem after a long period of inquiries. If anyone knows what to do, please leave a message, thank you very much!

Python arima model prediction problem

1. statsmodels.tsa.arima_model considerations

2. Existing problems

Guess you like