[Statistics Notes] (12) Time series analysis and prediction

(12) Time series analysis and prediction

Time series data is used to describe the characteristics of the phenomenon over time.

A time series (times series) is a sequence formed by arranging successive observations of the same phenomenon at different times. Most economic data is given in the form of time series.

Time series and its decomposition

Time series can be divided into stationary and non-stationary series.

Stationary series are basically no trend. The observations in this type of sequence basically fluctuate at a fixed level. Although the degree of fluctuation varies in different time periods, there is no certain rule. The fluctuation can be regarded as random, as shown in the figure :

Non-stationary series (non-stationary series) is a series that contains trends, seasonal or periodic, it may contain only one of the components, or may contain several components. Therefore, non-stationary series can be divided into trending series, trending and seasonal series, and a composite series composed of several components.

Trend (Trend): It is a kind of continuous rise or continuous decline in time series, also known as long-term trend. The trend in the time series can be linear or non-linear.

Seasonality (seasonality) is also called seasonal fluctuation (seasonal fluctuation), it is a cyclical fluctuation of time series repeated within a year. For example, in business activities, the terms "peak sales season" or "off-season sales" are often heard. It essentially refers to a periodic change.

Sequences containing seasonal components may or may not contain trends.

Cyclicity (cyclical fluctuation), also known as cyclical fluctuation (cyclical fluctuation), is a wavy or oscillating change around a long-term trend presented in a time series. Periodicity is usually caused by business and economic activities. It is different from trend changes. It is not a continuous movement in a single direction, but alternating fluctuations with the same fluctuations. And, the change cycle is mostly one year, and the cyclical fluctuations have no fixed law, and the change cycle is more than one year, and the cycle length is different. Cyclicity is usually caused by changes in the economic environment.

In addition, there are some accidental factors that affect the time series, causing the time series to exhibit some random fluctuations. The accidental fluctuations after removing the trend, periodicity and seasonality in the time series are called randomness, also known as irregular variations. 

The components of a time series can be divided into four types: trend (T), seasonal or seasonal variation (S), periodic or cyclical fluctuation (C), randomness or irregular fluctuation (I).

Constituent elements: long-term trends, seasonal changes, cyclical changes, irregular changes.
1) The long-term trend (T) phenomenon is a general change trend formed by a fundamental factor for a long period of time.
2) The seasonal change (S) phenomenon is a regular periodic change that occurs with the seasonal change within a year.
3) Cyclic fluctuation (C) The regular fluctuation of the wave undulation that the phenomenon takes several years as a cycle.
4) Irregular changes (I) are irregular changes, including strict random changes and irregular sudden changes that have great impact.


Time series type

Absolute time series


Period series: A time series arranged by total period indicators.
The main features of the period series are:
1) The index values ​​in the series are additive.
2) The value of each indicator in the sequence is directly related to the length of the period it reflects.
3) The value of each indicator in the sequence is usually obtained through continuous registration and aggregation.


Time series: A time series arranged by total time
points. The main features of the time series are:
1) The index values ​​in the series are not additive.
2) The size of each index value in the sequence is not directly related to the length of the interval.
3) The value of each indicator in the sequence is usually obtained through a regular registration.

Relative time series

A time series formed by a series of relative indicators of the same kind arranged in time sequence is called a relative number time series.


Mean time series

The average time series refers to a time series arranged by a series of similar average indicators in chronological order.

Issues to be noted when compiling time series data

Ensure the comparability of index values ​​in each period in the sequence

  • Periods should be consistent
  • The overall scope should be consistent
  • The economic content of indicators should be unified
  • The calculation method should be unified
  • Comparable calculation price and unit of measurement

 


Growth rate analysis

It is a description of how the phenomenon changes at different times. Due to the different base period of comparison, the growth rate has different calculation methods.

(1) Growth rate (growth rate): The growth rate is the result of the ratio of the observation value in the reporting period and the observation value in the base period minus 1 in the time series, expressed in%. As the base period of comparison is different, it can be divided into chain growth rate and fixed base growth rate.

Chain growth rate: the ratio of the observation value in the reporting period to the observation value in the previous period minus 1, indicating the degree of the phenomenon's period-by-period growth change;

The fixed-base growth rate is the ratio of the observation value in the reporting period to the observation value in a certain fixed period minus 1, indicating that the overall growth and variation of the phenomenon during the entire observation period.

Let the growth rate be G: Chain growth rate:

 

                                   Fixed base growth rate:

 

(2) Average growth rate (average rate of increase): The average growth rate is the result of the geometric average of the period-by-period chain-by-cycle value (chain-to-cycle development rate) minus 1 in the time series:

   n: the number of ring ratio

(3) Problems that should be paid attention to in the analysis of growth rate

  • When the observations in the time series show 0 or negative numbers, it is not appropriate to calculate the growth rate. This kind of sequence calculation growth rate, either does not conform to mathematical axioms, or cannot explain its practical significance. Absolute numbers can be used for analysis.
  • In some cases, one cannot simply talk about growth rates and pay attention to combining growth rates with absolute levels. The growth rate is a relative value and is related to the magnitude of the base value of the comparison. In this case, calculate the absolute value of 1% growth to overcome the limitations of the growth rate analysis: The absolute value of 1% growth represents the absolute amount of increase for each percentage point increase in the growth rate: the absolute value of 1% growth = previous level / 100

Time series prediction procedures

One of the main purposes of time series analysis is to predict the future based on existing historical data. Time series contain different components, such as trend, seasonality, periodicity, and randomness. For a specific time series, it may contain one component, or it may contain several components at the same time. The prediction methods used for time series with different components are different.

Forecasting steps:

Step 1: Determine the components included in the time series and determine the type of time series

Step 2: Find a prediction method suitable for this type of time series

Step 3: Evaluate possible prediction methods to determine the best prediction plan

Step 4: Use the best prediction plan to make predictions

1. Determine the time series components

(1) Determine the trend component

To determine whether a trend component exists, you can draw a line graph of the time series to see if there is a trend in the time series, and whether the trend is linear or nonlinear.

Use regression analysis to fit a trend line and test the significance of regression coefficients. The regression coefficient is significant, and it can be concluded that the linear trend is significant.

(2) Determine the seasonal composition

To determine whether the seasonal component exists, it takes at least two years of data, and the data needs to be recorded by quarter, month, week, or day. Plotable, folded annual time series plot (folded annual time series plot), you need to draw the annual data separately on the graph, the horizontal axis is only one year in length, and the annual data corresponds to the vertical axis. If there is only seasonal component in the time series, the polyline in the annual folding time series will have a cross; if the time series contains both seasonal components and trends, the polyline in the annual folding time series will not have a cross, if the trend rises , The polyline of the following year will be higher than the polyline of the previous year, if it drops, the polyline of the subsequent year will be lower than the polyline of the previous year.

2. Select the prediction method After
determining the time series type, select the appropriate prediction method. Using time data to make predictions, it is usually assumed that the past trends will continue into the future, so that you can make predictions based on the existing patterns or patterns in the past. Prediction method of time series: traditional method: simple average method, moving average method, exponential smoothing method, etc., modern method: Box-Jenkins' autoregressive model (ARMA).

Generally speaking, there will be irregular components in any time series. In business and management data, periodicity is generally not considered, only trend components and seasonal components.

Time series without trend and seasonal components, that is, stationary time series contain only random components, as long as smoothing can eliminate random fluctuations. Therefore, this type of prediction method is also called a smooth prediction method.

3. Evaluation of prediction methods
When selecting a specific method for prediction, it is necessary to evaluate the prediction effect or accuracy of the method. The evaluation method is to find the gap between the predicted value and the actual value, that is, the prediction error. The optimal prediction method is the method that achieves the minimum prediction error.

Forecast error calculation method: average error, average absolute error, mean square error, average percentage error, average absolute percentage error. The choice of method depends on the forecaster ’s goals and familiarity with the method.

(1) Mean error: Y: observed value, F: predicted value, n number of predicted values

              

Since the value of the prediction error may be positive or negative, the results of the sum will cancel each other. In this case, the average error may underestimate the error.

(2) Mean absolute deviation (mean absolute deviation) is the average friction-free calculation after taking the absolute value of the prediction error, MAD:

        

The average absolute error can avoid the problem that the errors cancel each other out, so it can accurately reflect the actual prediction error.

(3) Mean square error (mean square error): The average error calculated after eliminating the sign of the error by square, MSE:

     

(4) Average percentage error and average absolute percentage error

The size of ME, MAD, and MSE is affected by the level and unit of measurement of time series data. Sometimes it does not really reflect the quality of the prediction model. It is only meaningful when comparing the predictions of the same data by different models. Mean percentage error (MPE) and mean absolute percentage error (MAPE) are different. They eliminate the effects of time series data levels and measurement units, and are relative values ​​that reflect the size of the error.

 

 

Prediction of stationary series

The stationary time series contains only random components, and the prediction methods are: simple average method, moving average method, exponential smoothing method.

Mainly by smoothing the time series to eliminate random fluctuations, also known as the smoothing method. The smoothing method can be used for short-term prediction of time series, and can also be used to smooth time series to describe the trend of the series (linear trend and nonlinear trend).

(1) Simple average method: predict the value of the next period through the simple average method based on the existing observations in period t. Set the existing t-phase observation value of the time series

, Then the predicted value of period t + 1 is:

After t + 1 period, with the actual value of t + 1 period, the forecast error of t + 1 period is: e_{t+1}=Y_{t+1}-F_{t+1}

 

Forecast value of period t + 2:

The simple average method is suitable for predicting a relatively stable time series, that is, when the time series has no trend, it is better to use this method. However, if the time series has trend or seasonal components, the prediction of this method is not accurate enough. The simple averaging method treats forward and near-term values ​​as equally important to the future. From the perspective of prediction, the recent value has a greater effect on the future than the long-term value, so the results of the simple average method are not accurate enough.

(2) Moving average method (moving average): a prediction method that obtains the average number as the predicted value by time-shifting time series. There are simple moving average method and weighted moving average method (weighted moving average method). average).

The simple moving average averages the latest k-period data as the predicted value of the next period. If the moving average interval is k (1 <k <t), the moving average of period t is:

It is the result of smoothing the time series. These smoothed values ​​can describe the changing patterns or trends of the time series. Can also be used to predict.

The simple moving average forecast for period t + 1 is:

The simple moving average forecast for period t + 2 is:

The moving average method only uses the data of the latest k period. When calculating the moving average, the moving interval is k, which is also suitable for predicting a relatively stable time series. The key to application is to determine a reasonable moving average interval k. For the same time series, using different movement intervals, the accuracy of the prediction is different. You can choose a moving interval that minimizes the mean square error by experiment. The movement interval is small, which can quickly reflect the change, but it cannot reflect the change trend; the movement interval is large, which can reflect the change trend, but the predicted value has a significant lag deviation.

The basic idea of ​​the moving average method: the moving average can eliminate or reduce the time series data caused by the random changes caused by accidental factors, suitable for short-term forecasting.

(3) Exponential smoothing (exponential smoothing) is to predict the weighted average of past observations to make the weighted average of the actual observations of period t and the predicted values ​​of period t such as the forecast value of period t + 1. The exponential smoothing method is developed from the moving average method and is an improved weighted average method. On the premise of not abandoning historical data, the historical data closer to the forecast period is given greater weight, and the weights are from near to far. Decrease according to exponential law, so it is called exponential smoothing. There are one-time exponential smoothing, two-time exponential smoothing, and three-time exponential smoothing.

One-time exponential smoothing method is also called single exponential smoothing (single exponential smoothing), there is only one smoothing coefficient, and the longer the observation value is from the prediction period, the smaller the weight becomes. An exponential smoothing is to use the linear combination of the predicted value and the observed value of a period of time as the predicted value of the t + 1 period. The prediction model is F_{t+1}=\alpha Y_t+(1-\alpha )F_t      \alpha: smoothing coefficient ( 0 \ leq \ alpha \ leq 1)

The data in period t + 1 is the weighted average of the actual observed value in period t and the predicted value in period t. Predicted value of Phase 1 = Observed value of Phase 1

Forecast for Phase 2:F_ {2} = \ alpha Y_1 + (1- \ alpha) F_1 = \ alpha Y_1 + (1- \ alpha) Y_1 = Y_1

Forecast for Phase 3:F_{3}=\alpha Y_2+(1-\alpha )F_2=\alpha Y_2+(1-\alpha )Y_1

4th forecast:F_{4}=\alpha Y_3+(1-\alpha )F_3=\alpha Y_3+\alpha (1-\alpha )Y_2+ (1-\alpha )^2Y_1

For the prediction accuracy of the exponential smoothing method, the mean square error is used to measure:

F_{t+1}=\alpha Y_t+(1-\alpha )F_t

          =F_t+\alpha (Y_t-F_t)

Is the predicted value of period t plus the adjusted forecast error of period t ( Y_t-F_t).

When using exponential smoothing, the key issue is to determine an appropriate smoothing coefficient \alpha, \alphawhich will have different effects on the prediction results.

\alpha= 0, the predicted value is just the prediction result of repeating the previous period; \alpha= 1, the predicted value is the actual value of the previous period;

\alphaThe closer to 1, the more timely the model's response to changes in the time series, because it gives the current actual value a greater weight than the predicted value;

\alphaThe closer to 0, the greater weight is given to the current predicted value, and the slower the model's response to time series changes.

When the time series has large random fluctuations, choose a larger one \alphato keep up with recent changes quickly; when the time series is relatively stable, choose a smaller one \alpha.

In actual application, the prediction error needs to be considered, and the mean square error is used to measure the size of the prediction error. When it is determined, you can select several to \alphamake predictions, and then find the smallest prediction error as the final \alphavalue.

Like the moving average method, the exponential smoothing method can be used to smooth the time series to eliminate random fluctuations and find out the trend of the series.

When predicting with the exponential smoothing method, the value is generally not greater than 0.5. If it is greater than 0.5, it can be close to the actual value, indicating that the sequence has a certain trend or excessive fluctuation.

Damping coefficient \ beta = 1- \ alpha  , the smaller the damping coefficient  , the greater the recent actual value's influence on the predicted results, and vice versa, the smaller. The damping coefficient is selected according to the changing characteristics of the time series.

Exponential smoothing

The exponential smoothing method is actually a special weighted moving average method.

Its characteristics are:

First, the exponential smoothing method further strengthens the effect of recent observations on the forecast during the observation period. The weights given to observations at different times vary, thus increasing the weight of recent observations, so that the predicted value can be quickly Reflect the actual changes in the market. The weights are reduced by an equal series. The first term of this series is the smoothing constant a, and the common ratio is (1- a).

Second, the exponential smoothing method is flexible to the weights given by the observations, and different a values ​​can be taken to change the rate of change of the weights. If a takes a small value, the weight changes more rapidly, and the recent trend of the observed value can be more quickly reflected in the exponential moving average. Therefore, using the exponential smoothing method, you can choose different values ​​of a to adjust the uniformity of the time series observations (that is, the degree of stability of the trend change).

According to different smoothing times, the exponential smoothing method is divided into: primary exponential smoothing method, secondary exponential smoothing method and cubic exponential smoothing method.


Forecasting of trend series

Trend forecasting analysis method, also known as time series forecasting analysis method, is based on the principle of continuity of business development, applying mathematical statistical methods to arrange past historical data in chronological order, and then using a certain numerical model to predict and predict the planned period of production ( Sales) volume or production (sales) amount of a forecast method.
Due to the different mathematical methods adopted by the forecasting analysis method of sales quantity or sales amount during the trend measurement plan, it can be divided into:

  • Arithmetic average method. Take the arithmetic average of the sales volume (or sales amount) in the past several periods as the sales forecast number in the planning period. The advantage of this method is that the calculation is simple, but because the average value is taken, it is relatively rough, and the measured quantity and the actual quantity will have a large error, so this method is only suitable for the sales volume is relatively stable Products, such as no seasonal food, daily necessities, etc.
  • Moving weighted average method. It is weighted according to the sales volume (or sales amount) in the past several times according to its distance from the planning period (the weighted number in the near future is larger, and the weighted number in the future is smaller; then the weighted average is calculated as the planning period. Sales forecasting numbers. The so-called "mobile" refers to the gradual progress of calculating the average.
  • Exponential smoothing. When predicting the sales volume (or sales amount) in the planning period, a smoothing coefficient (or weighting factor) is introduced for calculation. The exponential smoothing method and the moving weighted average method are substantially similar. Its advantage is that it can exclude the influence of accidental factors included in actual sales. But the determination of the value of the smoothing index also carries a certain subjective component. The larger the smoothing coefficient, the greater the impact of recent actual numbers on the prediction results, and the smaller the conversely. Therefore, a smaller smoothing factor is used so that the average of this method can reflect the long-term trend of changes in observations; a smaller smoothing factor can also be used so that the average of this method can reflect the changing trend of observations during the mound period for short-term Sales forecast,

 


Decomposition prediction of compound sequence

A compound sequence refers to a sequence containing trends, seasons, cycles, and random components.

The prediction method of this type of sequence is usually to decompose the various factors of the time series in turn and then make predictions.

Since the analysis of periodic components requires many years of data, it is difficult to obtain many years of data in practice, so the decomposition model used is:

\large Y_{t} = T_{t}\times S_{t} \times I_{t}

This model indicates that the time series contains trend components, seasonal components, and random components.

The prediction methods of this kind of sequence mainly include seasonal multiple regression model, seasonal autoregressive model and time series decomposition method prediction.

Decomposition prediction is usually carried out in the following steps:

Step 1: Identify and separate seasonal components.

Step 2: Establish a prediction model and make predictions.

Step 3: Calculate the final predicted value.

             Multiply the predicted value by the corresponding seasonal index to get the final predicted value.

Seasonal index (seasonal index) characterizes the typical seasonal characteristics of the sequence in each month or season within a year.

There are many ways to calculate the seasonal index, such as the average trend elimination method.

"Moving average trend elimination method" to determine the seasonal change trend. The basic steps are as follows:

First, calculate the moving average of the four seasons (or 12 months) based on the quarterly (or monthly) data (Y) of each year, and then calculate the moving average of the second quarter (monthly) for each period in order to be "positive" Long-term trend value (T).

Second, divide the actual value (Y) by the corresponding moving average (T) to obtain the Y / T for each period. This is the time series that eliminates the effects of long-term trends. It is a relative number called the seasonal index. The result is the value in the fourth column of the table.

Third, re-arrange the Y / T according to the "contemporaneous average method" to calculate the seasonal ratio. Then, according to the requirements of this method, first calculate the "average of the same season in different years", and then calculate the "average of the average of different seasons in the same season", that is, after eliminating long-term trend changes, the chronological average of the new series; finally, Calculate the seasonal ratio and draw a graph.


Seasonal fluctuations

Seasonal fluctuations refer to the regular periodic fluctuations of certain social phenomena due to the influence of social and natural factors within one year as the seasons change. Within a year, due to seasonal changes, certain social and economic phenomena (certain time series) will produce regular changes. Such regular changes are usually called seasonal fluctuations. Such as food, clothing, and some products or commodities with strong seasonal characteristics, their production and consumption show periodic fluctuations with the change of seasons, and there are "peak season" and "off season".

Seasonal fluctuations have three obvious characteristics:
(1) Seasonal fluctuations have a certain regularity and periodicity;
(2) Seasonal fluctuations recur every year and have repetition;
(3) Seasonal fluctuations have similar trajectories. 

There are many methods for measuring seasonal fluctuations, and the two commonly used methods are: the average method of the same period and the trend elimination method. 
Synchronous average method
This method is the easiest way to determine seasonal fluctuations. It uses several years of data to find the average level of the same month (season) and the total month (season) level of the whole year, and the comparison of the two results in the seasonal index of each month (season) to indicate the degree of seasonal fluctuations. The monthly average method can be divided into two types: direct (quarterly) monthly average method and ratio monthly average method.  


1. Direct monthly average method
Direct monthly (quarterly) average method regards the trend value of the entire time series as a constant. The calculation steps are as follows:
(1) Calculate the average of the same month (quarter) in
each year ; (2) Calculate the total average of all months (or quarters) in each year;
(3) Calculate the seasonal index. [4] 
2. Proportional monthly (seasonal) average method
This method compares the monthly (quarterly) data of each calendar year with the monthly (quarterly) average of this year before the monthly (quarterly) average. Obtain the seasonal ratio for the year; then average the ratio for the same period (month or season) in each year to find the seasonal index. 


Moving average trend elimination method
In the series with obvious long-term trend changes, in order to measure seasonal fluctuations, the trend change factors must be removed first. Assuming that the impact of trend changes, seasonal fluctuations, cyclic fluctuations and irregular changes on the time series can be reflected by a multiplicative model, the steps of measuring seasonal fluctuations using the moving average trend elimination method are as follows:
(1) Find the moving average of the original time series as the corresponding The trend value of the period.
(2) Excluding the trend changes in the original sequence, that is, the corresponding time data of the original sequence divided by the moving average.
(3) Calculate the seasonal index based on the series after eliminating the trend change and measure the seasonal fluctuation. 

 

Published 646 original articles · praised 198 · 690,000 views

Guess you like

Origin blog.csdn.net/seagal890/article/details/105588571