Click the above artificial intelligence algorithm and Python big data to get more dry goods
In the upper right ... Set as a star ★, get resources at the first time
Only for academic sharing, if there is any infringement, please contact to delete
Reprinted from: Machine Learning Research Institute Author: daydaymoyu
Source: https://zhuanlan.zhihu.com/p/424609116
Reference https://bookdown.org/gary_a_napier/time_series_lecture_notes/ChapterOne.html#time-series-modelling
Definition of time series
A time series process is defined as a random process, which is a collection of random variables sorted by time, that is, the point at each moment position is used as a random variable. is an index set that determines a time set that defines the time series process and produces observations . which assumes
The value of the random variable is continuous.
The set of time indices is discrete and equidistant.
Throughout the process, the following symbols are used
Random variables are represented by capital letters, i.e. , while the values of random variables are sampled from a distribution. And random variables can be defined for an infinite number of time points.
Observations are represented by lowercase letters, i.e., Observations can be thought of as realizations of random variables . But usually in practice, our observations are limited, so define observations to be .
Goals of Time Series Analysis
Given a set of time series data, one or more questions about it are usually asked to answer. The main types of problems that arise with time series data depend on the context of the data and the reason for collecting the data. Here are some common goals:
Description : Describe the main characteristics of the time series, e.g.: is the series increasing or decreasing; is there a seasonal pattern (e.g. higher in summer, lower in winter); how does the second explanatory variable affect the value of the time series?
Monitoring : Detect when time series behavior changes, such as a sudden drop in sales, or a sudden spike.
Forecasting : Predict future values of a time series from current values and quantify the uncertainty in those forecasts, such as predicting the temperature in the next few days based on today's temperature.
Regression : Given multiple time series and an additional value corresponding to those series, find the relationship among them.
Classification : Given multiple time series, classify them by similarity.
......
Modeling of time series
Time series data is typically decomposed into the following three components.
Trend - Trends represent long-term changes in the mean of time series data over time. If there is a trend, its shape is often of interest, although it may not be linear.
Seasonal effect - A seasonal effect is a trend in a time series that repeats at regular intervals. Strictly speaking, a seasonal effect is just an effect that repeats every year, but in a more general context, the term can be used more broadly to mean any pattern that repeats on a regular basis.
Unexplained variation - Unexplained variation is the remaining variation in the time series after any trend and seasonal variation has been removed. Such unexplained changes may be independent or may exhibit short-term correlations.
Therefore, a simple model of time series data can be represented in two ways, as
Additive model ( Additive ):
Multiplicative model ( Multiplicative ):
where represents trends, represents seasons, and represents unexplained changes. In this tutorial, two examples are given. That is, an additive model is appropriate when trend and seasonal variation act independently, whereas a multiplicative model is required if the magnitude of the seasonal effect depends on the magnitude of the trend. The additive model is suitable when the trend and seasonal variation act independently, while the multiplicative model is required if the magnitude of the seasonal effect depends on the magnitude of the trend. A simple diagram is as follows:
Example of additive model
properties of time series
(mean, variance, autocovariance function, autocorrelation function)
Given a time series process and observations , we typically characterize it using the following properties.
Mean function
For all , the mean function of the time series process is defined as
For real data, we usually assume that the mean is a constant, so we can estimate the mean to be
If the mean of the data is not constant, for example due to a trend or seasonal variation, other methods should be used to estimate this, as will be discussed later.
Variance function
For all , the variance function of the time series process is defined as
The standard deviation function is defined as
For real data, we usually assume that the variance is also a constant, so we can estimate the variance as
Autocovariance and autocorrelation functions
Recall that for arbitrary random variables and , the covariance and correlation measures are given by the following definitions
Covariance:
Correlation:
Correlation is a scaled representation of the covariance between -1 and 1, where 1 indicates a strong positive correlation, 0 indicates independence, and -1 indicates a strong negative correlation, but usually correlation refers to a linear correlation.
For a time series process, define random variables as measurements at different points in time. The dependencies between them are described by the autocovariance and autocorrelation functions, prefixed with "auto" to indicate that the two random variable measurements have the same quantity.
For all , the autocovariance function (ACVF) is defined as:
in
For all , the autocorrelation function (ACF) is defined as:
in
The above definitions are all ideal situations, that is, there are several sampled data at the time and time, so that the calculation can be performed. However, this condition is difficult to achieve in real scenarios, because usually at a certain point in time, only the 1 sample point of data.
To compute autocovariance and autocorrelation functions for real data, it is usually assumed that the dependency structure in the data does not change over time . That is, we assume
That is to say, under this assumption, the only factor that affects the covariance is the distance of the random variables in the two time series, this distance is often called lag lag .
Therefore, the only thing that needs to be calculated is the set of autocovariances:
In this case, the autocorrelation function becomes
The premise of the above calculation method is to assume that the dependency structure in the data does not change with time, and the covariance does not depend on the specific location, but only on the lag.
Estimating the autocorrelation function
For time series data, the autocovariance and autocorrelation functions measure the covariance/correlation between a single time series and its lag . Here is the calculation process of the real-time autocovariance and autocorrelation functions.
lag=0
The autocovariance function for a sample at lag 0 (lag=0) is defined as , which is the covariance between and . According to the above formula, the calculation method is
Therefore, the sample autocovariance function at lag 0 is the sample variance. Similarly, the autocorrelation at lag 0 is
lag=1
The sample auto-covariance function at lag 1 (lag=1) is the time series and covariance. It is the covariance between the sequence and itself moving a time point sequence. According to the above formula, the covariance and autocorrelation coefficient are calculated as
and
in
is the latter observation;
In practical applications, it is usually assumed that the mean and variance of the first n-1 observations are equal to the mean and variance of the last n-1 observations, which can simplify the above expression. Also, for the covariance formula, use the divisor n instead of the unbiased n-2. Obviously, when n is large, changing the divisor has little practical effect on the computation.
lag=
The sample autocovariance function (ACVF) of a time series is defined as:
The sample autocorrelation function (ACF) is defined as
Find interactive examples to help understand autocovariance and autocorrelation functions in the links below.
https://shiny.maths-stats.gla.ac.uk/gnapier/Time_Series_ACF/shiny.maths-stats.gla.ac.uk/gnapier/Time_Series_ACF/
Interpretation of Correlogram Plots
Correlogram takes the calculation result of the autocorrelation function as the vertical axis and the lag as a kind of graph on the horizontal axis. It is very intuitive to see the correlation between different lags of the time series. Correlograms tell time series analysts a lot about time series, including the presence of trends, seasonal changes, and short-term correlations. Here are some examples to illustrate.
Example - purely random data
Consider a time series generated by a purely random process, which has no trend, seasonality, or short-term correlation. The raw data and autocorrelation plot are shown below:
When , this value is usually ignored because it is the correlation of the series with itself.
For purely random sequences with no correlation, usually equals 1 at lag 0, but at other lags there is no clear evidence of correlation.
Example - short-term correlation
Time series data with no trend or seasonality but short-term correlation is shown in the figure below and has a significant positive autocorrelation at the first few lags, followed by values approaching zero at larger lags.
Example - alternating data
Time series data that has no trend or seasonality but alternates between large and small values is shown in the plot below, and has negative autocorrelation at odd lags and positive autocorrelation at even lags. As the lag increases, the autocorrelation gets closer and closer to zero.
Example - data with a trend
Time series data with a trend is shown below and still has a positive autocorrelation when the lag is too large. The same correlation plot would be observed if the trend decreased over time.
Example - data with a seasonal effect
Time series data with seasonal effects is shown in the figure below and has a regular seasonal pattern in the correlation plot.
Example - data with a trend and a seasonal effect
Time series data with trend and seasonal effects are shown in the plot below, and have regular seasonal patterns in the correlogram, which usually has positive values due to the trend.
Stationarity Analysis
strictly
stationary or strongly stationary
Strictly stationary is a very harsh condition. Given a time series process , for all and values of , the series is strictly stationary if the joint distribution is the same as . In other words, moving the time origin of a sequence has no effect on its joint distribution.
When , strictly stationary means that for all , there is . This also shows that the mean and variance of the time series are constant, namely
and , when , strictly stationary means that for all , there is
Joint distribution depends only on lags
This in turn means that the theoretical covariance and correlation functions only depend on the lags and not the original positions.
Strictly stationary is very strict, and real processes rarely conform. Generally, only purely random processes are strictly stationary, so weak stationary processes are used more often.
Weakly stable
weakly stationary
Given a time series process , if the time series process is weakly stationary, then it needs to satisfy the following conditions:
The mean is constant and finite, i.e.
The variance is constant and finite, i.e.
The autocovariance and autocorrelation functions depend only on the lags , i.e. and
The difference between strict stationarity and weak stationarity is that the latter only assumes that the first two moments (mean and variance) are constant over time, while the former assumes that the higher moment is also constant.
Example
Define a random walk process , and
where is a random process with both 0 and variance . Then is non- stationary . because
This means that the variance varies with time.
---------♥---------
Statement: This content comes from the Internet, and the copyright belongs to the original author
The pictures are sourced from the Internet and do not represent the position of this official account. If there is any infringement, please contact to delete
Dr. AI's private WeChat, there are still a few vacancies
How to draw a beautiful deep learning model diagram?
How to draw a beautiful neural network diagram?
One article to understand various convolutions in deep learning
Click to see support