Time series model of machine learning

1. Time series concept
In production and scientific research, the observation and measurement of a certain or a group of variables x(t) will be arranged in time order at a series of times t1, t2, …, tn (t is the independent variable), And used to explain the mathematical expressions of variables and their relationships. The collection of data collected at different time points in the same time interval is called a time series. This time-significant series is also called dynamic data and is used to predict long-term development trends. Such dynamic data is very common in natural, economic and social fields. For example, under certain ecological conditions, the number of animal and plant populations increases and decreases month by month or year by year, the closing index of a stock exchange every day, the monthly gnp, the number of unemployed, or the price index, and so on.

Second, the difference between time series and regression problems
1. Time series are related to time, and the assumption of the regression model is: the observation results are independent of each other and there is no dependence.
2. Time series, which will rise or fall as time changes, and seasonal fluctuations may also occur.

3. Time series model

(1) The use of the ARIMA model requires data to be stable.
Stationary : It is required that you and defects obtained through the time series can continue to follow the existing "inertia" for a period of time in the future, and the mean and variance of the series will not change in detail.
Stationarity is divided into strict stability and weak stability:
strict stability: the distribution does not change with time. Such as white noise (normal distribution), no matter how to take it, the expectation is 0 and the variance is 1.
Weakly stationary: expectation and correlation coefficient (dependency) remain unchanged. The value Xt of t at a certain time depends on his past information.

(2) Difference method : interpolation of time series at t and t-1 (first-order difference)

(3) Commonly used time series models are
1. AR model (Autoregressive model: autoregressive model)
2. MA model (moving average model: moving average model)
3. ARMA model (Auto-Regressive and Moving Average Model: autoregressive and moving average model) Average model)
4.ARIMA model (Autoregressive Integrated Moving Average Model: Autoregressive Integrated Moving Average Model)

1. The AR model (Autoregressive model)
describes the relationship between the current value and the historical value, and uses the historical data of the variable itself to predict itself.
The autoregressive model must meet the stationarity requirements.
Insert picture description here
Restrictions: Predicting with own data must be stable and must have autocorrelation. If the correlation coefficient is less than 0.5, it should not be used. It is only suitable for predicting phenomena related to the previous period.

2. MA model (moving average model: moving average model)
focuses on the accumulation of errors in the autoregressive model, which can effectively eliminate random fluctuations in prediction.
Insert picture description here
3. ARMA model (Auto-Regressive and Moving Average Model: autoregressive moving average model)
the combination of the first two
Insert picture description here
4. ARIMA model (Autoregressive Integrated Moving Average Model: autoregressive integrated moving average model)
** principle: ** will not After the stable time series is transformed into a stable time series, the dependent variable is only regressed to its lag value and the present value and lag value of the random error term. The model is established

ARIMA (p, d, q): p is the autoregressive term, MA is the average movement, q is the number of average movement terms, and d is the number of differences made when the time series becomes stationary.

4. Model parameter selection 1.
p, q value selection
autocorrelation function AFC :
an ordered sequence of random variables is compared with itself, the autocorrelation function reflects the correlation between the values ​​of the same sequence in different time series
Insert picture description here
pk Value range [-1,1]
Partial autocorrelation function (PACF)
Insert picture description here
Insert picture description here
5. ARIMA modeling process:
1. Make the sequence stationary (determine d by the difference method) 2.
Determine the order of p and q: ACF and PACF
3. ARIMA( p,d,q)

6. Model evaluation criteria: The
Insert picture description here
most suitable p, q
model residual test can be output :
whether the residual has the average value of 0 and the variance is normal distribution
QQ diagram: linear is normal distribution

Seven, time series model application

LSTM model analysis and specific implementation of time series data prediction (python implementation)

Time series prediction based on wavelet transform, implemented in Python, from Snowball

Guess you like

Origin blog.csdn.net/XiaoMaEr66/article/details/105284877