Explanation: This is a machine learning practical project (with data + code + documentation + video explanation ). If you need data + code + documentation + video explanation, you can go directly to the end of the article to get it.
1. Project background
SARIMA is a seasonal autoregressive moving average model. For periodic time series, the periodicity needs to be removed first. The removal method is to do ARIMA on the periodic interval. At this time, a non-stationary non-periodic time series can be obtained, and then use ARIMA again on this basis for analysis. .
This project uses the SARIMA algorithm for modeling, prediction, and analysis to form a complete project actual combat content.
2. Data acquisition
The modeling data for this time comes from the Internet (compiled by the author of this project), and the statistics of the data items are as follows:
The data details are as follows (partial display):
3. Data preprocessing
3.1 View data with Pandas tools
Use the head() method of the Pandas tool to view the first five rows of data:
key code:
3.2 Data missing view
Use the info() method of the Pandas tool to view data information:
As can be seen from the figure above, there are a total of 2 variables, no missing values in the data, and a total of 65 data.
key code:
3.3 Data descriptive statistics
Use the describe() method of the Pandas tool to view the mean, standard deviation, minimum, quantile, and maximum of the data.
The key code is as follows:
4. Exploratory Data Analysis
4.1 x1 variable time series analysis chart
5. Build SARIMA timing model
5.1 Introduction to key points of SARIMA model
There are seven SARIMA structure parameters: (p,d,q) (P,D,Q,s)
p: Represents the number of lags (lags) of the time series data itself used in the forecasting model, also called the AR/Auto-Regressive item.
d: Represents that time-series data needs to be differentially differentiated to be stable. It is also called the Integrated item.
q: Represents the number of lags (lags) of the forecast error used in the forecast model, also known as the MA/Moving Average item.
P: Periodic autoregressive order.
D: Periodic difference order.
Q: Periodic moving average order.
s: cycle time interval.
5.2 Sequence stationarity test
Figure 5.2-1 Timing diagram of original sequence
This figure shows that the series fluctuated greatly in 2017, and the fluctuations from 2021 to 2022 were relatively stable, which can be judged as a weakly stationary series.
Figure 5.2-2 Autocorrelation plot of the original sequence
The autocorrelation diagram of this figure shows that the overall autocorrelation coefficient hovers between -0.2 and 0.2, indicating that there is a weak short-term correlation between the series.
Table 5.2-1 Unit root test of original sequence
The p value corresponding to the unit root test statistic in this table is less than 0.05, indicating that the sequence is a stationary sequence.
5.3 Perform first-order difference on the original sequence, and perform stationarity and white noise test
1) Make a stationarity judgment on the sequence after the first-order difference.
Figure 5.3-1 Timing diagram of sequence after first-order difference
Figure 5.3-2 The autocorrelation diagram of the sequence after the first difference
Table 5.3-1 Unit root test of sequence after first difference
The results show that the sequence diagram after the first difference fluctuates relatively smoothly around the mean, the autocorrelation diagram has a strong short-term correlation, and the p value of the unit root test is less than 0.05, so the sequence after the first difference is a stationary sequence.
2) Do a white noise test on the sequence after the first-order difference
Table 5.3-2 White noise test of sequence after first difference
The p-value output by this table is much less than 0.05, so the sequence after the first-order difference is a stationary non-white noise sequence.
5.4 Fitting the SARIMA model to the series after the first difference
The following is the ordering of the model. Model determination is to determine p, d, q, P, D, Q, s.
The first method: human identification, model order determination according to the figure below.
Figure 5.4-1 Partial autocorrelation diagram of the sequence after the first difference
The autocorrelation graph after the first-order difference shows 1st-order truncation, and the partial autocorrelation graph shows tailing, so it can be considered to use the MA(1) model to fit the sequence after the first-order difference, that is, to establish SARIMA (order = (0, 1, 1)) model.
The second method: relatively optimal model identification (it is recommended that you use this method).
Calculate SARIMA (order=(p, d, q), seasonal_order=(P, D, Q, s)), use a variety of combinations to obtain the BIC information of all combinations, and take the model with the smallest BIC information Order.
The calculated BIC matrix is as follows:
When the p value is 0, the q value is 0, the P value is 0, the Q value is 1, and the s value is 12, the minimum BIC value is 143.5571. At this point the model is finalized.
Use the AR(0) model to fit the sequence after the first-order difference, that is, to establish a SARIMA (order=(0,0,0), seasonal_order=(0,1,1,12)) model for the original sequence. The sequence fitting SARIMA model after the first-order difference is analyzed as follows:
1. Parameter testing and parameter estimation are shown in the table below:
Table 5.4-1 Model parameters
2. Residual error test of model test:
Figure 5.4-2 Residual autocorrelation diagram
Figure 5.4-3 Residual Partial Autocorrelation Plot
Figure 5.4-4 Residual Partial Autocorrelation Plot
DW test :
When the DW value is significantly close to 0 or 4, there is autocorrelation, and when it is close to 2, there is no (first-order) autocorrelation.
The result of the DW test is: 1.3132. Therefore, there is no autocorrelation in the residuals predicted by the model, which indicates that the fitted model predicts well.
The white noise test results of the residual sequence are: ([8.01950276]), ([0.00462763]), it can be seen that the p value is 0.00462, P<0.05.
5.5 SARIMA model prediction
Apply SARIMA (order=(0,0,0), seasonal_order=(0,1,1,12)) to forecast the project data for a period of 5 months, the results are shown in the following table:
Table 5.5-1 Forecast data for the next 5 months
6. Conclusion and Outlook
To sum up, this paper uses the seasonal autoregressive moving average model algorithm to build a time series analysis model, and finds the optimal parameter value by calculating the BIC information, which finally proves that the model we proposed works very well.
The materials and project resources required for the actual combat of this machine learning project are as follows:
Project Description:
Link: https://pan.baidu.com/s/1dW3S1a6KGdUHK90W-lmA4w
Extraction code: bcbp