Time series analysis and forecasting (reprint)

A time-series, and the decomposition
time series (time series) is the successive observations of the same phenomenon at different times of arrayed sequences. Depending on the time series of observation time in time can be can be year, quarter, month or any other time form.

sequentially:

(1) stationary sequence (stationary series)
sequences are substantially absent trend was observed for each sequence of values on a substantially fixed level fluctuations at different levels in different time fluctuation, but a certain rule does not exist, random fluctuations

 

 

 

(2) a non-stationary sequence (non-stationary series)
comprising trends, seasonal or periodic sequences, contains only one ingredient, it may be a combination of several ingredients. It can be divided into: Sequence trend, and seasonal trend sequences, several components are mixed together in the composite sequence.

Trend (trend): time series presented over the long term or some kind of change continue to rise continued to decline, also known as the long-term trend. Trend in the time series may be a linear and non-linear.

 

 

 

Seasonal (seasonality): seasonal variation (seasonal fluctuation), is a periodic time series within a year recurring fluctuations. Sales season, low season, tourist season, tourist season, due to the different seasons and change. Season, refers not only to the four seasons of the year, in fact, refers to any cyclical changes. Sequence containing seasonal ingredients may contain the trend, may not contain the trend.

 

 

 

Periodically (cyclicity): cyclic fluctuations (cyclical fluctuation), the time series is presented a wavy or oscillating about a long-term trend of the fluctuations. Periodicity is caused by commercial and economic activities, unlike the trend change, not a continuous movement toward a single direction, but alternately fluctuation fluctuation and white; change is different from the season, the season changes have relatively fixed rule, and change cycle most of the year, the cycle fluctuation no fixed rule, multi-cycle variation in more than one year, and the cycle of varying lengths. Periodicity is usually caused by changes in the economic environment.

In addition, there are contingency factors have an impact on the time series, resulting in a time series showing some random fluctuations. Chance fluctuations in time series after removing the trend, and seasonal periodicity, called random (Random), also known as irregular fluctuations (irregular variations).

Component time series can be divided into four types: Trend (T), seasonal or seasonal variation (S), or a periodic fluctuation cycle (C), or an irregular random fluctuations (I). A key element of the traditional time series analysis of these components is to separate out from the time series, and the relationship between them to be expressed by certain mathematical relationships, then analyzed separately. Press the four components of different ways influence on the time series, the time series can be decomposed into a number of models: the additive model (additive model), the multiplicative model (multiplicative model). Multiplicative model:

Second, descriptive analysis
1, a graphical depiction

2, the growth rate analysis

A description of the phenomenon at different times of changing conditions made. Due to the different groups of the comparative growth rates of different calculation methods.

(1) the growth rate (growth rate): growth rate, time series of results reported in the ratio of 1 observed values ​​of observation and the group Save, expressed in%. Since the comparison of the different groups, the chain can be divided into a fixed base and a growth rate.

The chain growth rate: is the ratio of the reporting period was observed and observed values ​​of the previous period minus 1, indicating the extent of the phenomenon of growth by changes;

Reporting rate is set based on observation and the observation value than a fixed period of time minus 1, a total increase in the degree of variation phenomena throughout the observation period.

Set growth rate G: chain growth:

 

 

                                       Fixed base growth rate:

 

 

(2) average rate (average rate of increase): average growth rate, is the ratio of the ring by the geometric mean of the time series (chain growth rate) of a subtraction result of:

   n: the ratio of the number of ring

(3) the growth rate of the analysis should pay attention to the problem

i: time series when viewed occurrence 0 or negative, the growth rate should not be calculated. This sequence is calculated growth rate, or do not conform to mathematical axioms, or can not explain its real significance. Available absolute analysis.

ii: In some cases, not simply on the growth rate on the growth rate, the growth rate combined with absolute attention level. The growth rate is a relative value, with the size of the base values ​​related to contrast. This case, the calculation of the absolute value growth of 1% growth to overcome the limitations of the analysis:

1% growth in absolute terms the growth rate of the increase of one percentage point increase in the absolute number: previous levels increase the absolute value = 1% / 100

Third, time series forecasting program
one of the main purposes of time series analysis is to predict the future based on existing historical data. Time sequence containing different ingredients, such as trends, seasonal periodicity and randomness. For a particular time series, it may contain an ingredient, may also contain several components, time series forecasting method used containing different components are different. Prediction step:

Step 1: Determine the time sequence component comprising determining the type of time series

Step Two: Find prediction method suitable for such time series

The third step: The prediction method may be evaluated to determine the optimum prediction scheme

Step four: Use the best prediction scheme to predict

1, determine the time sequence elements
(1) determining the trend component

Determining whether the trend component is present, the line can be plotted in FIG time series, there is a tendency to see if the time series, and there is a tendency linear or nonlinear.

Regression fit analysis using a trend line, the regression coefficient significance test. Regression coefficient significant linear trend can be drawn significant conclusions.

(2) determining the component Season

Seasonal ingredients to determine whether the presence of at least two years of data, and the data required by quarter, month, week or day to record. Drawable, annual time sequence diagram folded (folded annual time series plot), year data need to separate the drawn figure, abscissa length of only one year, the annual data respectively corresponding to the longitudinal axis. If there is only time-series seasonal component, the annual time series chart folded folding line will cross; if both a time-series seasonal trend component also contains, folding the time sequence diagram of the year will not cross the fold line, if the rising trend , line the back of the year will be higher than the previous year polyline, if lowered, the broken line behind the year will be lower than the previous year polyline.

2, selects the prediction method of
determining the type of time series, selecting an appropriate prediction method. Use time data to predict generally assumed that past trends will continue into the future, so that you can be predicted based on past or existing form mode. Time series prediction methods: conventional methods: simple average, moving average, exponential smoothing method, modern methods: Box-Jenkins autoregressive model (ARMA).

In general, any time series will have the presence of random components, in business and management data is usually not considered cyclical, considering only the trend and seasonal component ingredients.

 

 

 

Free time series and seasonal trend component, i.e. stationary time sequence containing random component, simply by smoothing can eliminate random fluctuations. Therefore, this type of forecasting method, also known as smoothing prediction method.

3, evaluation prediction method
when selecting a particular method for predicting the need to evaluate the effect or prediction accuracy of the method. Evaluation method is to identify gaps predicted value and the actual value of the forecast error. Optimal prediction method is a method to minimize the prediction error.

Prediction error calculation method: mean error, mean absolute error, mean square error, the mean percentage error, mean absolute percentage error. The choice of method depends on the target's forecast, familiarity with the methods.

(1) Average error (mean error): Y: observed value, F: predicted value, n the number of predicted values

              

Since the value of prediction errors may be positive or negative result of the summation will offset each other, in this case, an error may underestimate the average error.

(2) the mean absolute error (mean absolute deviation) is the average no rub, MAD computing the absolute value of the prediction error:

        

The average absolute error can avoid the problem of error cancel each other out, and thus can accurately reflect the size of the actual prediction error.

(3) mean square error (mean square error): the mean square error calculated by the error erasing sign, MSE:

     

(4) and the average percentage error of the mean absolute percentage error

Affect the level and unit of measurement of time-series data of ME, MAD, MSE by size, sometimes not truly reflect the quality of the prediction model, compare different models predict only makes sense when the same data. The mean percentage error (mean percentage error, MPE) and mean absolute percentage error (mean absolute percentage error, MAPE) is different, and they eliminate the influence of the level measurement units of time-series data, the error is reflected in the size of the relative value.

 

 

4, stationary predicted sequence of
stationary time series contains only random component, prediction methods: simple average, moving average, exponential smoothing. Mainly by time series smoothing to eliminate random fluctuations, also known as smoothing. Smoothing of the time series can be used for short-term prediction of time series may be smoothed to trend (trend nonlinear and linear trend) described sequence.

(1) simple average method: According to the existing observed values ​​of t to predict the next value by a simple average method. Provided a time series t of conventional obsd

, The period t + 1 predictive value:

T + 1 to the rear, with the actual value of t + is a, t + 1 the prediction error as: e_{t+1}=Y_{t+1}-F_{t+1}

 

t + 2 Predictive Value:

Simple average method suitable for more stable prediction of time series, i.e. time series when no trend, this method is better. But if there is a trend or seasonal time series components, the prediction method is not accurate enough. The average value of a simple method of long-term and short-term numerical seen as equally important for the future. From the prediction point of view, the recent long-term value than the value of a greater role in the future, so predicting the result of a simple average method is not accurate enough.

(2) moving average (moving average): by the time series obtained by the average of transitive method for predicting as a predicted value, simple moving average (simple moving average) and the weighted moving average (weighted moving average).

K simple moving average of the most recent data to be averaged as the next predicted value. Setting a moving average interval k (1 <k <t), then the moving average period t is:

  The result is a smooth time series, by which the smoothed value may describe a change in shape or tendency of time series. It can also be used to predict.

t + 1 period simple moving average predictive value:

t + 2 period simple moving average predictive value:

Moving average method using only the data of the latest k, each time moving average is calculated, k are moving interval, but also more suitable for stationary time series forecasting. The key is to determine the reasonable application of the moving average interval k. For the same time series, using different movement interval, the prediction accuracy is different. Test method by selecting a minimum mean square error of the movement interval. Movement interval small, quickly reflect the changes, but can not reflect the trend; large movement interval can reflect the trend, the predicted value of the deviation with a significant lag.

The basic idea of ​​moving average: The moving average can eliminate or reduce the effects of random variation of time series data generated by the interference of chance factors for short-term prediction.

(3) exponential smoothing (exponential smoothing) is predicted by the average of the past observations weighting the weighted average of the predicted value of t + t of the predicted value like an actual observation and the period t. Exponential Smoothing development comes from the moving average, it is a modified weighted average method, without discarding historical data, historical data from the close of the forecast period given the larger number of weights, weights from near to far exponentially decreasing, so called exponential smoothing. Once exponential smoothing exponential smoothing, double exponential smoothing, cubic exponential smoothing method.

A single exponential smoothing, also known exponential smoothing method (single exponential smoothing), only a smoothing coefficient, and the more distant from the predicted observation period, becomes smaller weights. Exponential smoothing is a linear combination of the predicted and observed values of the period of time t + 1 as the prediction value, the prediction model: F_{t+1}=\alpha Y_t+(1-\alpha )F_t      \alpha: smoothing coefficient ( 0 \ leq \ alpha \ leq 1)

t + 1 is the weighted average of the predicted data values ​​of t and the actual observation period t. A predicted value of a = observations of

2 predictive value:F_{2}=\alpha Y_1+(1-\alpha )F_1=\alpha Y_1+(1-\alpha )Y_1=Y_1

3 predictive value:F_{3}=\alpha Y_2+(1-\alpha )F_2=\alpha Y_2+(1-\alpha )Y_1

4 predictive value:F_{4}=\alpha Y_3+(1-\alpha )F_3=\alpha Y_3+\alpha (1-\alpha )Y_2+ (1-\alpha )^2Y_1

The prediction accuracy of exponential smoothing, a = mean square error measure:

F_{t+1}=\alpha Y_t+(1-\alpha )F_t

          =F_t+\alpha (Y_t-F_t)

T is the predicted value of t plus a prediction error of adjustment ( Y_t-F_t).

When using the exponential smoothing, a key issue is to determine the appropriate smoothing coefficient \alpha, different \alphato have different effects on the predicted result.

\alpha= 0, the predicted values are merely a repeat prediction result; \alpha= 1, the predicted value is an actual value;

\alphaCloser to 1, the more timely changes in the time series model reaction, because it gives the current actual value to a number greater than the predicted values ​​of the weights;

\alphaThe closer to 0, the value assigned to the current forecast for several more weight, the slower the response time series model changes.

When the time series larger random fluctuation, whichever is the larger \alphain order to keep pace with recent changes quickly; relatively stable when the time series, selected smaller \alpha.

In practice, consider the prediction error, with the mean square error measure the size of the prediction error. Determining, select several \alphaprediction, then find the minimum prediction error as the final \alphavalue.

Like the moving average, exponential smoothing a time series can be used for smoothing, in order to eliminate random fluctuations, identify trends sequence.

When a prediction of exponential smoothing, the general value of not more than 0.5, if it exceeds 0.5, close to the actual value, indicating a trend or sequence that excessive fluctuations.

Damping coefficient \beta=1-\alpha  , the smaller the damping, the greater the impact on the actual value of the predicted result of recent, conversely, smaller. Damping coefficient is selected according to time-series change characteristics.

5, trend prediction type sequence
trends can be divided into time-series linear trend and non-linearities, if this trend will continue into the future, it can be predicted using the trend extrapolation. Sequence trend forecasting methods are linear trend forecasting, trending and non-linear autoregressive model predictions.

(1) linear trend prediction

Linear trend (linear trend) is a phenomenon over time and showed a linear variation of steady growth or decline.

Trends equation: \hat{Y}_t=b_0+b_1t       :  \hat{Y}_t: time series Y_tprediction value; b_1trend line slope, a change in unit time t, the average change in the number of observations

(2) Nonlinear trend prediction

Trends sequence is generally considered to be due to the same factors some fixed direction is formed. If this factor varies linearly with time, the time series may be linear trend fit; if exhibit some non-linearities (non-linear trend), it is necessary to fit the appropriate trend curve.

i: exponential curve (exponential curve): used to describe geometrically increasing or decreasing phenomenon, i.e. the observed time-series change exponentially, or by observation of the time-series of a certain growth rate or attenuation. Most natural growth and general economic index series have trends.

Trend equation: \hat{Y}_t=b_0b_1^{t}     b_0, b_1coefficients to be determined

If b_1> 1, the time t with the increase in the growth rate increases; if b_1<1, the growth rate increases with time t decreases; if b_0> 0, b_1<1, the predicted value \hat{Y}_tis gradually decreased to 0 in the limit.

Is determined b_0, b_1may be employed linear means into its logarithmic form linear, logarithmic ends:lg\hat{Y}_t=lgb_0+tlgb_1

The principle of least squares, the straight-line method for determining constants determined form  lgb_0, lgb_1is obtained  lgb_0, lgb_1after, whichever antilog then, can be obtained b_0, b_1.

\left\{\begin{matrix}\sum lgY=nlgb_0+lgb_1\sum t\\\sum t lgY_t=lgb_0\sum t+lgb_1\sum t^2\end{matrix}\right.

 

II: multi-stage curve:

Some changes in form complex phenomenon, not according to some fixed form changes, but the ups and downs, there may be several turning point in the process of change. Then you need to fit a polynomial function. When only one inflection point, two curves can be fit, i.e. parabola; when there are two inflection points, it required third-order curve fitting; the k-1 when the inflection point, k needs to order curve fitting.

6, the decomposition of the complex sequence predicted
composite sequence refers to a sequence containing the trend, season, periodic and random components. A method for prediction of such sequences is successively decomposed time series various factors, then the prediction. Since the analysis requires many years of periodic component data, in practice difficult to obtain data for many years, and therefore decomposition model is used:Y_t=T_t\times S_t\times I_t

Forecasting methods are: seasonal multivariate regression model, seasonal autoregression model and forecast time series decomposition method.

Prediction decomposition step:

Step One: Identify and separation season ingredients. Seasonal index calculation, to determine the time series seasonal component. The component is then separated from the season out of time series, i.e. divided by the corresponding seasonal index with each of a time series of observation, to remove seasonal.

Step two: build predictive models and predictions. Establishing appropriate forecasting model eliminates the seasonal component of time series and forecast based on this model.

Third step: calculating the final prediction value. Seasonal index multiplied by the corresponding predicted value, to obtain a final prediction value.

(1) identifying and separating season component

Analysis of seasonal factors is to represent seasonal ingredients each year by seasonal index, in order to describe changes in patterns of each season of the year.

i: index calculation season (seasonal index)

Seasonal index depicts a typical seasonal characteristic sequence within a year of each month or each quarter. In the multiplicative model, the seasonal index equal to 100% of its average condition constituted, reflecting the value of a month or quarter, accounting for annual average size. If there is no change in the development of the phenomenon of the season, the seasonal index for each period shall be equal to 100%; if there are significant seasonal variation given month or quarter, the seasonal index for each period should be greater than or less than 100%. Thus, the degree of fluctuation season is determined based on the degree of deviation of its average seasonal index (100%).

Seasonal index calculation methods are more moving average trend out method steps:

The first step: moving average is calculated (if seasonal data, using four moving average, monthly data is used 12 moving average), and the central processing results of its upcoming moving average once again the result of two moving averages , i.e., the center of the obtained moving average (CMA).

Step Two: calculating a moving average of the ratio, i.e. the ratio of the season, each sequence of observations divided by the corresponding upcoming center of the moving average, then calculate the ratio of each month or quarterly average.

The third step: adjusted seasonal index. Since the seasonal average index should be 100% or equal to 1, if the average value of the ratio of the second step of calculating the season is not equal to 1, it needs to be adjusted. Specific methods: the average value of the ratio of each season second step of dividing the calculated overall average thereof.

ii: separating component Season

After calculating the seasonal index, each can be divided by the corresponding values ​​actually observed seasonal exponential, seasonal component separated from the time series:\frac{Y}{S}=\frac{T*S*I}{S}=T*I

The result is the sequence of the season after component separation, does not reflect changes in shape under the influence of seasonal factors in the time series.
----------------
Disclaimer: This article is CSDN blogger "mengjizhiyou 'original article, follow the CC 4.0 BY-SA copyright agreement, reproduced, please attach the original source link and this statement. .
Original link: https://blog.csdn.net/mengjizhiyou/article/details/82683448     

 

Guess you like

Origin www.cnblogs.com/Koi504330/p/11925463.html