Finally, the key points of time series analysis are all clearly explained!

Click the above artificial intelligence algorithm and Python big data to get more dry goods

In the upper right  ...  Set as a star ★, get resources at the first time

Only for academic sharing, if there is any infringement, please contact to delete

Reprinted from: Machine Learning Research Institute Author: daydaymoyu
Source: https://zhuanlan.zhihu.com/p/424609116
Reference https://bookdown.org/gary_a_napier/time_series_lecture_notes/ChapterOne.html#time-series-modelling

Definition of time series

A time series process is defined as a random process, which is a collection of random variables sorted by time, that is, the point at each moment position is used as a random variable. is an index set that determines a time set that defines the time series process and produces observations . which assumes

  • The value of the random variable is continuous.

  • The set of time indices is discrete and equidistant.

Throughout the process, the following symbols are used

  • Random variables are represented by capital letters, i.e. , while the values ​​of random variables are sampled from a distribution. And random variables can be defined for an infinite number of time points.

  • Observations are represented by lowercase letters, i.e., Observations can be thought of as realizations of random variables . But usually in practice, our observations are limited, so define observations to be .

Goals of Time Series Analysis

Given a set of time series data, one or more questions about it are usually asked to answer. The main types of problems that arise with time series data depend on the context of the data and the reason for collecting the data. Here are some common goals:

  • Description : Describe the main characteristics of the time series, e.g.: is the series increasing or decreasing; is there a seasonal pattern (e.g. higher in summer, lower in winter); how does the second explanatory variable affect the value of the time series?

  • Monitoring : Detect when time series behavior changes, such as a sudden drop in sales, or a sudden spike.

  • Forecasting : Predict future values ​​of a time series from current values ​​and quantify the uncertainty in those forecasts, such as predicting the temperature in the next few days based on today's temperature.

  • Regression : Given multiple time series and an additional value corresponding to those series, find the relationship among them.

  • Classification : Given multiple time series, classify them by similarity.

  • ......

Modeling of time series

Time series data is typically decomposed into the following three components.

  • Trend - Trends represent long-term changes in the mean of time series data over time. If there is a trend, its shape is often of interest, although it may not be linear.

  • Seasonal effect - A seasonal effect is a trend in a time series that repeats at regular intervals. Strictly speaking, a seasonal effect is just an effect that repeats every year, but in a more general context, the term can be used more broadly to mean any pattern that repeats on a regular basis.

  • Unexplained variation - Unexplained variation is the remaining variation in the time series after any trend and seasonal variation has been removed. Such unexplained changes may be independent or may exhibit short-term correlations.

Therefore, a simple model of time series data can be represented in two ways, as

Additive model ( Additive ):
Multiplicative model ( Multiplicative ):

where represents trends, represents seasons, and represents unexplained changes. In this tutorial, two examples are given. That is, an additive model is appropriate when trend and seasonal variation act independently, whereas a multiplicative model is required if the magnitude of the seasonal effect depends on the magnitude of the trend. The additive model is suitable when the trend and seasonal variation act independently, while the multiplicative model is required if the magnitude of the seasonal effect depends on the magnitude of the trend. A simple diagram is as follows:

  • Example of additive model

befd11111ef18b695ca5f57705888263.png

properties of time series

(mean, variance, autocovariance function, autocorrelation function)

Given a time series process and observations , we typically characterize it using the following properties.

  • Mean function

For all , the mean function of the time series process is defined as

For real data, we usually assume that the mean is a constant, so we can estimate the mean to be

If the mean of the data is not constant, for example due to a trend or seasonal variation, other methods should be used to estimate this, as will be discussed later.

  • Variance function

For all , the variance function of the time series process is defined as

The standard deviation function is defined as

For real data, we usually assume that the variance is also a constant, so we can estimate the variance as

  • Autocovariance and autocorrelation functions

Recall that for arbitrary random variables and , the covariance and correlation measures are given by the following definitions

Covariance:

Correlation:

Correlation is a scaled representation of the covariance between -1 and 1, where 1 indicates a strong positive correlation, 0 indicates independence, and -1 indicates a strong negative correlation, but usually correlation refers to a linear correlation.

For a time series process, define random variables as measurements at different points in time. The dependencies between them are described by the autocovariance and autocorrelation functions, prefixed with "auto" to indicate that the two random variable measurements have the same quantity.

For all , the autocovariance function (ACVF) is defined as:

in

For all , the autocorrelation function (ACF) is defined as:

in

The above definitions are all ideal situations, that is, there are several sampled data at the time and time, so that the calculation can be performed. However, this condition is difficult to achieve in real scenarios, because usually at a certain point in time, only the 1 sample point of data.

To compute autocovariance and autocorrelation functions for real data, it is usually assumed that the dependency structure in the data does not change over time . That is, we assume

That is to say, under this assumption, the only factor that affects the covariance is the distance of the random variables in the two time series, this distance is often called lag lag .

Therefore, the only thing that needs to be calculated is the set of autocovariances:

In this case, the autocorrelation function becomes

The premise of the above calculation method is to assume that the dependency structure in the data does not change with time, and the covariance does not depend on the specific location, but only on the lag.

Estimating the autocorrelation function

For time series data, the autocovariance and autocorrelation functions measure the covariance/correlation between a single time series and its lag . Here is the calculation process of the real-time autocovariance and autocorrelation functions.

lag=0

The autocovariance function for a sample at lag 0 (lag=0) is defined as , which is the covariance between and . According to the above formula, the calculation method is

Therefore, the sample autocovariance function at lag 0 is the sample variance. Similarly, the autocorrelation at lag 0 is

lag=1

The sample auto-covariance function at lag 1 (lag=1) is the time series and covariance. It is the covariance between the sequence and itself moving a time point sequence. According to the above formula, the covariance and autocorrelation coefficient are calculated as

and

in

is the latter observation;

In practical applications, it is usually assumed that the mean and variance of the first n-1 observations are equal to the mean and variance of the last n-1 observations, which can simplify the above expression. Also, for the covariance formula, use the divisor n instead of the unbiased n-2. Obviously, when n is large, changing the divisor has little practical effect on the computation.

lag=

The sample autocovariance function (ACVF) of a time series is defined as:

The sample autocorrelation function (ACF) is defined as

Find interactive examples to help understand autocovariance and autocorrelation functions in the links below.

https://shiny.maths-stats.gla.ac.uk/gnapier/Time_Series_ACF/shiny.maths-stats.gla.ac.uk/gnapier/Time_Series_ACF/

Interpretation of Correlogram Plots

Correlogram takes the calculation result of the autocorrelation function as the vertical axis and the lag as a kind of graph on the horizontal axis. It is very intuitive to see the correlation between different lags of the time series. Correlograms tell time series analysts a lot about time series, including the presence of trends, seasonal changes, and short-term correlations. Here are some examples to illustrate.

Example - purely random data

Consider a time series generated by a purely random process, which has no trend, seasonality, or short-term correlation. The raw data and autocorrelation plot are shown below:

85b0c0b5737f342874ba19edc61d3c76.png

  • When , this value is usually ignored because it is the correlation of the series with itself.

  • For purely random sequences with no correlation, usually equals 1 at lag 0, but at other lags there is no clear evidence of correlation.

Example - short-term correlation

Time series data with no trend or seasonality but short-term correlation is shown in the figure below and has a significant positive autocorrelation at the first few lags, followed by values ​​approaching zero at larger lags.

1bf33e51b189111d125547be951c3213.png

Example - alternating data

Time series data that has no trend or seasonality but alternates between large and small values ​​is shown in the plot below, and has negative autocorrelation at odd lags and positive autocorrelation at even lags. As the lag increases, the autocorrelation gets closer and closer to zero.

77d276306caaa74b59ec8dbafd89c3d5.png

Example - data with a trend

Time series data with a trend is shown below and still has a positive autocorrelation when the lag is too large. The same correlation plot would be observed if the trend decreased over time.

e28605973f6215d8dc5c532a24f3b30e.png

Example - data with a seasonal effect

Time series data with seasonal effects is shown in the figure below and has a regular seasonal pattern in the correlation plot.

941bc68dc39c96a267e6135020fcb3bc.png

Example - data with a trend and a seasonal effect

Time series data with trend and seasonal effects are shown in the plot below, and have regular seasonal patterns in the correlogram, which usually has positive values ​​due to the trend.

3abbd46a3870f816493d793cbd112033.png

Stationarity Analysis

strictly
stationary or strongly stationary

Strictly stationary is a very harsh condition. Given a time series process , for all and values ​​of , the series is strictly stationary if the joint distribution is the same as . In other words, moving the time origin of a sequence has no effect on its joint distribution.

When , strictly stationary means that for all , there is . This also shows that the mean and variance of the time series are constant, namely

and , when , strictly stationary means that for all , there is

Joint distribution depends only on lags

This in turn means that the theoretical covariance and correlation functions only depend on the lags and not the original positions.

Strictly stationary is very strict, and real processes rarely conform. Generally, only purely random processes are strictly stationary, so weak stationary processes are used more often.

Weakly stable

weakly stationary

Given a time series process , if the time series process is weakly stationary, then it needs to satisfy the following conditions:

  1. The mean is constant and finite, i.e.

  2. The variance is constant and finite, i.e.

  3. The autocovariance and autocorrelation functions depend only on the lags , i.e. and

The difference between strict stationarity and weak stationarity is that the latter only assumes that the first two moments (mean and variance) are constant over time, while the former assumes that the higher moment is also constant.

Example

Define a random walk process , and

where is a random process with both 0 and variance . Then is non- stationary . because

This means that the variance varies with time.

---------♥---------

Statement: This content comes from the Internet, and the copyright belongs to the original author

The pictures are sourced from the Internet and do not represent the position of this official account. If there is any infringement, please contact to delete

Dr. AI's private WeChat, there are still a few vacancies

b0e8f199198dd4ebd698110b697b0b1b.png

e64cbfe447551fe97a79feaa8fbd3c27.gif

How to draw a beautiful deep learning model diagram?

How to draw a beautiful neural network diagram?

One article to understand various convolutions in deep learning

Click to see support8cb4464bbd8433bab5564043c9220d98.png1a777ca0df43577ecdfb439d3b05611b.png

Guess you like

Origin blog.csdn.net/qq_15698613/article/details/122098558