Python based on seasonal autoregressive moving average model (SARIMA model) for time series analysis and modeling project actual combat

Explanation: This is a machine learning practical project (with data + code + documentation + video explanation ). If you need data + code + documentation + video explanation, you can go directly to the end of the article to get it.

1. Project background

SARIMA is a seasonal autoregressive moving average model. For periodic time series, the periodicity needs to be removed first. The removal method is to do ARIMA on the periodic interval. At this time, a non-stationary non-periodic time series can be obtained, and then use ARIMA again on this basis for analysis. .

This project uses the SARIMA algorithm for modeling, prediction, and analysis to form a complete project actual combat content.

2. Data acquisition

The modeling data for this time comes from the Internet (compiled by the author of this project), and the statistics of the data items are as follows:

 

The data details are as follows (partial display):

 

3. Data preprocessing

3.1 View data with Pandas tools

Use the head() method of the Pandas tool to view the first five rows of data:

 key code:

3.2 Data missing view

Use the info() method of the Pandas tool to view data information:

As can be seen from the figure above, there are a total of 2 variables, no missing values ​​in the data, and a total of 65 data.

key code:

 

3.3 Data descriptive statistics

Use the describe() method of the Pandas tool to view the mean, standard deviation, minimum, quantile, and maximum of the data.

 The key code is as follows:

4. Exploratory Data Analysis

4.1 x1 variable time series analysis chart

5. Build SARIMA timing model

5.1 Introduction to key points of SARIMA model

There are seven SARIMA structure parameters: (p,d,q) (P,D,Q,s)

p: Represents the number of lags (lags) of the time series data itself used in the forecasting model, also called the AR/Auto-Regressive item.

d: Represents that time-series data needs to be differentially differentiated to be stable. It is also called the Integrated item.

q: Represents the number of lags (lags) of the forecast error used in the forecast model, also known as the MA/Moving Average item.

P: Periodic autoregressive order.

D: Periodic difference order.

Q: Periodic moving average order.

s: cycle time interval.

5.2 Sequence stationarity test

Figure 5.2-1 Timing diagram of original sequence

This figure shows that the series fluctuated greatly in 2017, and the fluctuations from 2021 to 2022 were relatively stable, which can be judged as a weakly stationary series.

Figure 5.2-2 Autocorrelation plot of the original sequence

The autocorrelation diagram of this figure shows that the overall autocorrelation coefficient hovers between -0.2 and 0.2, indicating that there is a weak short-term correlation between the series.

Table 5.2-1 Unit root test of original sequence

The p value corresponding to the unit root test statistic in this table is less than 0.05, indicating that the sequence is a stationary sequence.

5.3 Perform first-order difference on the original sequence, and perform stationarity and white noise test

1) Make a stationarity judgment on the sequence after the first-order difference.

 Figure 5.3-1 Timing diagram of sequence after first-order difference

Figure 5.3-2 The autocorrelation diagram of the sequence after the first difference

Table 5.3-1 Unit root test of sequence after first difference

The results show that the sequence diagram after the first difference fluctuates relatively smoothly around the mean, the autocorrelation diagram has a strong short-term correlation, and the p value of the unit root test is less than 0.05, so the sequence after the first difference is a stationary sequence.

2) Do a white noise test on the sequence after the first-order difference

Table 5.3-2 White noise test of sequence after first difference

The p-value output by this table is much less than 0.05, so the sequence after the first-order difference is a stationary non-white noise sequence.

5.4 Fitting the SARIMA model to the series after the first difference

The following is the ordering of the model. Model determination is to determine p, d, q, P, D, Q, s.

The first method: human identification, model order determination according to the figure below. 

Figure 5.4-1 Partial autocorrelation diagram of the sequence after the first difference

The autocorrelation graph after the first-order difference shows 1st-order truncation, and the partial autocorrelation graph shows tailing, so it can be considered to use the MA(1) model to fit the sequence after the first-order difference, that is, to establish SARIMA (order = (0, 1, 1)) model.

The second method: relatively optimal model identification (it is recommended that you use this method).

Calculate SARIMA (order=(p, d, q), seasonal_order=(P, D, Q, s)), use a variety of combinations to obtain the BIC information of all combinations, and take the model with the smallest BIC information Order.

The calculated BIC matrix is ​​as follows:

When the p value is 0, the q value is 0, the P value is 0, the Q value is 1, and the s value is 12, the minimum BIC value is 143.5571. At this point the model is finalized.

Use the AR(0) model to fit the sequence after the first-order difference, that is, to establish a SARIMA (order=(0,0,0), seasonal_order=(0,1,1,12)) model for the original sequence. The sequence fitting SARIMA model after the first-order difference is analyzed as follows: 

1. Parameter testing and parameter estimation are shown in the table below:

Table 5.4-1 Model parameters

2. Residual error test of model test:

 Figure 5.4-2 Residual autocorrelation diagram

Figure 5.4-3 Residual Partial Autocorrelation Plot 

Figure 5.4-4 Residual Partial Autocorrelation Plot

DW test :

When the DW value is significantly close to 0 or 4, there is autocorrelation, and when it is close to 2, there is no (first-order) autocorrelation.

The result of the DW test is: 1.3132. Therefore, there is no autocorrelation in the residuals predicted by the model, which indicates that the fitted model predicts well.

The white noise test results of the residual sequence are: ([8.01950276]), ([0.00462763]), it can be seen that the p value is 0.00462, P<0.05.

5.5 SARIMA model prediction

Apply SARIMA (order=(0,0,0), seasonal_order=(0,1,1,12)) to forecast the project data for a period of 5 months, the results are shown in the following table:

Table 5.5-1 Forecast data for the next 5 months

6. Conclusion and Outlook

To sum up, this paper uses the seasonal autoregressive moving average model algorithm to build a time series analysis model, and finds the optimal parameter value by calculating the BIC information, which finally proves that the model we proposed works very well.

The materials and project resources required for the actual combat of this machine learning project are as follows:

Project Description:
Link: https://pan.baidu.com/s/1dW3S1a6KGdUHK90W-lmA4w 
Extraction code: bcbp

Guess you like

Origin blog.csdn.net/weixin_42163563/article/details/127800645