Introduction to Deep Learning Algorithms for Time Series Prediction

bd5d5d6669a20d4e5ea578912ab69ef3.png

来源:算法进阶
本文约11000字,建议阅读20+分钟对于复杂的非线性模式,深度学习模型有很好的表达能力。


1 Overview

The deep learning method is a machine learning method that uses a neural network model for advanced pattern recognition and automatic feature extraction. It has achieved good results in the field of time series prediction in recent years. Commonly used deep learning models include recurrent neural network (RNN), long-short-term memory network (LSTM), gated recurrent unit (GRU), convolutional neural network (CNN), attention mechanism (Attention), and mixed model (Mix). Compared with machine learning, which requires complex feature engineering, these models usually only need data preprocessing, network structure design, and hyperparameter adjustment to output time series prediction results end-to-end.

Deep learning algorithms can automatically learn patterns and trends in time series data. Neural networks involve important parameters such as the number of hidden layers, the number of neurons, learning rates, and activation functions. For complex nonlinear patterns, deep learning models have good expressive capabilities. When applying deep learning methods for time series prediction, it is necessary to consider the stationarity and periodicity of data, select appropriate models and parameters, conduct training and testing, and perform model tuning and verification.

a6d31c31a7170ff23b4aa30a18c2e7e3.png

2 Algorithm display

2.1 RNN class

In RNN, the input at each moment and the state at the previous moment are mapped to the hidden state, and the output at the next moment is predicted based on the current input and the previous state. An important feature of RNN is that it can handle variable length sequence data, so it is very suitable for time series data in time series forecasting. In addition, RNN can also improve the expression ability and memory ability of the model by adding gating mechanisms such as LSTM, GRU, and SRU.

2.1.1 RNN(1990)

Paper:Finding Structure in Time

RNN (Recurrent Neural Network) is a powerful deep learning model that is often used for time series forecasting. RNN transmits historical information into the future by unfolding the neural network in time, thereby being able to handle timing dependencies and dynamic changes in time series data. In the construction of RNN models, LSTM and GRU models are often used because they can handle long sequences, and have memory units and gating mechanisms, which can effectively capture timing dependencies in time series.

 
  
# RNN
model = RNNModel(
    model="RNN",
    hidden_dim=60,
    dropout=0,
    batch_size=100,
    n_epochs=200,
    optimizer_kwargs={"lr": 1e-3},
    # model_name="Air_RNN",
    log_tensorboard=True,
    random_state=42,
    training_length=20,
    input_chunk_length=60,
    # force_reset=True,
    # save_checkpoints=True,
)
2c7734606ab44d4044454bc82ab84df0.png

2.1.2 LSTM(1997)

Paper:Long Short-Term Memory

LSTM (Long Short Term Memory) is a commonly used recurrent neural network model that is often used for time series forecasting. Compared with the basic RNN model, LSTM has stronger memory and long-term dependency capabilities, and can better handle timing dependencies and dynamic changes in time series data. In the construction of the LSTM model, the key is the design and parameter adjustment of the LSTM unit. The design of LSTM unit can affect the memory ability and long-term dependence ability of the model, and the adjustment of parameters can affect the prediction accuracy and robustness of the model.

# LSTM
model = RNNModel(
    model="LSTM",
    hidden_dim=60,
    dropout=0,
    batch_size=100,
    n_epochs=200,
    optimizer_kwargs={"lr": 1e-3},
    # model_name="Air_RNN",
    log_tensorboard=True,
    random_state=42,
    training_length=20,
    input_chunk_length=60,
    # force_reset=True,
    # save_checkpoints=True,
)
960e577151df48cef7c0c7e004a06edc.png


2.1.3 GRU(2014)

Paper:Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation

GRU (Gated Recurrent Unit) is a commonly used recurrent neural network model, similar to the LSTM model, and it is also a model specially designed for processing time series data. Compared with the LSTM model, the GRU model has fewer parameters and faster operation speed, but it can still deal with the timing dependence and dynamic changes in time series data. In the construction of the GRU model, the key is the design and parameter adjustment of the GRU unit. The design of the GRU unit can affect the memory ability and long-term dependence ability of the model, and the adjustment of parameters can affect the prediction accuracy and robustness of the model.

 
  
# GRU
model = RNNModel(
    model="GRU",
    hidden_dim=60,
    dropout=0,
    batch_size=100,
    n_epochs=200,
    optimizer_kwargs={"lr": 1e-3},
    # model_name="Air_RNN",
    log_tensorboard=True,
    random_state=42,
    training_length=20,
    input_chunk_length=60,
    # force_reset=True,
    # save_checkpoints=True,
)
d3c26a62ff97fea2e75b0ea814794ed3.png


2.1.4 SRU(2018)

Paper:Simple Recurrent Units for Highly Parallelizable Recurrence

SRU (Stochastic Matrix Unit) is a recurrent neural network model based on matrix calculation, and it is also a model specially designed for processing time series data. Compared with the traditional LSTM and GRU models, the SRU model has fewer parameters and faster operation speed, and can handle the timing dependence and dynamic changes in time series data. In the construction of the SRU model, the key is the design and parameter adjustment of the SRU unit. The design of SRU unit can affect the memory ability and long-term dependence ability of the model, and the adjustment of parameters can affect the prediction accuracy and robustness of the model.

2.2 CNN class

CNN can automatically extract the features of time series data through operations such as convolutional layer and pooling layer, so as to realize time series prediction. When applying CNN for time series prediction, it is necessary to convert the time series data into a two-dimensional matrix form, then use operations such as convolution and pooling for feature extraction and compression, and finally use a fully connected layer for prediction. Compared with traditional time series forecasting methods, CNN can automatically learn complex patterns and laws in time series data, while having better computational efficiency and prediction accuracy.

2.2.1 WaveNet(2016)

Paper:WAVENET: A GENERATIVE MODEL FOR RAW AUDIO

WaveNet is a neural network model for speech generation proposed by the DeepMind team in 2016. Its core idea is to use convolutional neural networks to simulate the waveform of speech signals, and use residual connections and gated convolution operations to improve the representation of the model. In addition to being used for speech generation, WaveNet can also be applied to temporal prediction tasks. In the time series prediction task, we need to predict the value of the next time step of a given time series. Normally, we can regard the time series as a one-dimensional vector and input it into the WaveNet model to get the predicted value of the next time step.

In the construction of the WaveNet model, the key is the design and parameter adjustment of the convolutional layer. The design of the convolutional layer can affect the expression ability and generalization ability of the model, and the adjustment of parameters can affect the prediction accuracy and robustness of the model.

2.2.2 TCN(2018)

Paper:An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling

TCN (Temporal Convolutional Network) is a timing prediction algorithm based on convolutional neural network. Its original design is to solve the problems of gradient disappearance and high computational complexity in traditional RNN (cyclic neural network) when processing long sequences. . Compared with traditional sequence models such as RNN, TCN uses the characteristics of convolutional neural networks to model long-term dependencies in a shorter time and has better parallel computing capabilities. The TCN model consists of multiple convolutional layers and residual connections, where the output of each convolutional layer is input to subsequent convolutional layers, thereby realizing layer-by-layer abstraction and feature extraction of sequence data. TCN also uses a residual connection technology similar to ResNet, which can effectively reduce problems such as gradient disappearance and model degradation, and hole convolution can expand the receptive field of the convolution kernel, thereby improving the robustness and accuracy of the model.

The structure of the TCN model is shown in the figure below:

711516ba1e834717f56eb932cf25ecc9.png

The prediction process of the TCN model includes the following steps:

  • Input Layer: Receives input of time series data.

  • Convolution layer: One-dimensional convolution is used to extract and abstract the input data. Each convolution layer contains multiple convolution kernels, which can capture time series patterns of different scales.

  • Residual connection: Similar to ResNet, by residually connecting the output of the convolutional layer with the input, problems such as gradient disappearance and model degradation can be effectively reduced, and the robustness of the model can be improved.

  • Repeated stacking: Multiple convolutional layers and residual connections are stacked repeatedly to extract abstract features of time series data layer by layer.

  • Pooling layer: Add a global average pooling layer after the last convolutional layer to average all feature vectors to obtain a fixed-length feature vector.

  • Output layer: The output of the pooling layer is output through a fully connected layer to obtain the predicted value of the time series.

The advantages of the TCN model include:

  • It can handle long sequence data and has good parallelism.

  • By introducing techniques such as residual connections and dilated convolutions, the problems of gradient disappearance and overfitting are avoided.

  • Compared with the traditional RNN model, the TCN model has higher computational efficiency and prediction accuracy.

 
  
# 模型构建
TCN = TCNModel(
    input_chunk_length=13,
    output_chunk_length=12,
    n_epochs=200,
    dropout=0.1,
    dilation_base=2,
    weight_norm=True,
    kernel_size=5,
    num_filters=3,
    random_state=0,
)
# 模型训练,无协变量
TCN.fit(series=train,
        val_series=val,
        verbose=True
)
# 模型训练,有协变量
TCN.fit(series=train,
        past_covariates=train_month,
        val_series=val,
        val_past_covariates=val_month,
        verbose=True
)
# 模型推理
backtest = TCN.historical_forecasts(
    series=ts,
    # past_covariates=month_series,
    start=0.75,
    forecast_horizon=10,
    retrain=False,
    verbose=True,
)
# 成果可视化
ts.plot(label="actual")
backtest.plot(label="backtest (D=10)")
plt.legend()
plt.show()

Exploring the impact of data normalization on time series forecasting?

Whether the original data generates covariates by month and whether they are normalized has a great impact on the final time-series prediction effect. For this experimental scenario, the original data is a percentile system, which is more suitable for the non-normalized & covariate method, and the covariates need to be selected according to actual business performance.

6b462c68bd599b53550af2e9a3740caa.png

Normalized & No Covariates

0e54020c2e7fdf509cd7eb73abd07daa.png

Normalized & with covariates

80051ae399a831bd65033dca0926cd2e.png

No normalization & no covariates

df748dddaf2e62406207f45a5f056071.png

Without normalization & with covariates

2.2.3 DeepTCN(2019)

Paper:Probabilistic Forecasting with Temporal Convolutional Neural Network.Code:deepTCN

DeepTCN (Deep Temporal Convolutional Networks) is a time series prediction model based on deep learning, which is an improvement and extension of the traditional TCN model. The DeepTCN model uses a set of 1D convolutional layers and maximum pooling layers to process time-series data, and extracts different features of time-series data by stacking multiple such convolution-pooling layers. In the DeepTCN model, each convolution layer contains multiple 1D convolution kernels and activation functions, and uses residual connection and batch normalization techniques to speed up the training of the model.

The training process of the DeepTCN model usually involves the following steps:

  • Data preprocessing: Standardize and normalize the original time series data to reduce the impact of scale inconsistency of different features on model training.

  • Model construction: use multiple 1D convolutional layers and maximum pooling layers to build DeepTCN models, and deep learning frameworks such as TensorFlow, PyTorch, etc. can be used to build models.

  • Model training: Use the training data set to train the DeepTCN model, and measure the predictive performance of the model through loss functions (such as MSE, RMSE, etc.). During the training process, optimization algorithms (such as SGD, Adam, etc.) can be used to update model parameters, and techniques such as batch normalization and DeepTCN can be used to improve the generalization ability of the model.

  • Model evaluation: Use the test data set to evaluate the trained DEEPTCN model, and calculate the performance indicators of the model, such as mean absolute error (MAE), mean absolute percentage error (MAPE), etc.

Exploring the impact of model training input and output b length on timing prediction?

As far as this experimental scenario is concerned, due to the limitation of the original data samples, the input and output length and batch_size cannot be adjusted too large. From the performance point of view, it is recommended to choose a large batch_size&short input and output method.

 
  
# 短输入输出
deeptcn = TCNModel(
    input_chunk_length=13,
    output_chunk_length=12,
    kernel_size=2,
    num_filters=4,
    dilation_base=2,
    dropout=0.1,
    random_state=0,
    likelihood=GaussianLikelihood(),
)
# 长输入输出
deeptcn = TCNModel(
    input_chunk_length=60,
    output_chunk_length=20,
    kernel_size=2,
    num_filters=4,
    dilation_base=2,
    dropout=0.1,
    random_state=0,
    likelihood=GaussianLikelihood(),
)
# 长输入输出,大batch_size
deeptcn = TCNModel(
    batch_size=60,
    input_chunk_length=60,
    output_chunk_length=20,
    kernel_size=2,
    num_filters=4,
    dilation_base=2,
    dropout=0.1,
    random_state=0,
    likelihood=GaussianLikelihood(),
)
# 短输入输出,大batch_size
deeptcn = TCNModel(
    batch_size=60,
    input_chunk_length=13,
    output_chunk_length=12,
    kernel_size=2,
    num_filters=4,
    dilation_base=2,
    dropout=0.1,
    random_state=0,
    likelihood=GaussianLikelihood(),
)

45645ab5dc98f95791674ea5815aa8bd.pngShort input and output‍

b9ceac8163a3932c0854d9e18f312ba2.png

long input and output

b3bfae21c551d57e97b43f8c5420916e.png

Long input and output, large batch_size

4c2844607d7bf1694dcd34d898a5786f.pngShort input and output, large batch_size

2.3 Attention class

Attention mechanism (Attention) is a mechanism for solving important feature extraction in sequence input data, and has also been applied in the field of time series prediction. The Attention mechanism can automatically focus on important parts of time series data and provide more useful information for the model, thereby improving prediction accuracy. When applying Attention for time series prediction, it is necessary to use the Attention mechanism to adaptively weight each part of the input data, so that the model can pay more attention to key information while reducing the influence of irrelevant information. The Attention mechanism can be applied not only to sequence models such as RNN, but also to non-sequence models such as CNN. It is one of the hotspots in the field of time series prediction.

2.3.1 Transformer(2017)

Paper:Attention Is All You Need

Transformer is a neural network model widely used in the field of natural language processing (NLP), and its essence is a sequence-to-sequence (seq2seq) model. Transformer regards each position in the sequence as a vector, and uses a multi-head self-attention mechanism and a feed-forward neural network to capture the long-range dependencies in the sequence, so that the model can handle variable-length sequences and variable-length sequences.

b4a3fe2f1d7a8d22545f71049c990925.jpeg

In the time series prediction task, the Transformer model can take the time step of the input sequence as the position information, express the feature of each time step as a vector, and use the encoder-decoder framework for prediction. Specifically, the first N time steps of the predicted target can be used as the input of the encoder, and the last M time steps of the predicted target can be used as the input of the decoder, and the encoder-decoder framework can be used for prediction. Both the encoder and the decoder are stacked by multiple Transformer modules, and each module consists of a multi-head self-attention layer and a feed-forward neural network layer.

During the training process, common loss functions such as mean square error (MSE) or mean absolute error (MAE) can be used to measure the predictive performance of the model, and optimization algorithms such as stochastic gradient descent (SGD) or Adam can be used to update model parameters. In the process of model training, techniques such as learning rate adjustment and gradient clipping can also be used to speed up model training and improve model performance.

 
  
# Transformer
model = TransformerModel(
    input_chunk_length=30,
    output_chunk_length=15,
    batch_size=32,
    n_epochs=200,
    # model_name="air_transformer",
    nr_epochs_val_period=10,
    d_model=16,
    nhead=8,
    num_encoder_layers=2,
    num_decoder_layers=2,
    dim_feedforward=128,
    dropout=0.1,
    optimizer_kwargs={"lr": 1e-2},
    activation="relu",
    random_state=42,
    # save_checkpoints=True,
    # force_reset=True,
)
6e2c624342e671f9ab798b539584c643.png


2.3.2 TFT(2019)

Paper:Temporal Fusion Transformers for Interpretable Multi-horizon Time Series Forecasting

TFT (Transformer-based Time Series Forecasting) is a time series forecasting method based on the Transformer model, which was proposed by the Google DeepMind team in 2019. The core idea of ​​the TFT method is to introduce Temporal Feature Embedding and Modality Embedding into the Transformer model. Time feature embedding can help the model to better learn the characteristics of periodicity and trend in time series data, while mode embedding can predict external influencing factors (such as temperature, holidays, etc.) together with time series data.

f70929b934200e1d94169b4167e73781.png

The TFT method can be divided into two phases: training phase and prediction phase. In the training phase, the TFT method uses the training data to train the Transformer model, and uses some tricks (such as random mask, adaptive learning rate adjustment, etc.) to improve the robustness and training efficiency of the model. In the prediction stage, the TFT method uses the trained model to predict future time series data.

Compared with the traditional time series prediction method, the TFT method has the following advantages:

  • It can better handle time series data of different scales, because the Transformer model can learn the global and local features of time series.

  • Time series data and external influencing factors can be considered at the same time, thereby improving forecasting accuracy.

  • The predictive model can be directly learned through end-to-end training without manual feature extraction.

 
  
# TFT
model = TransformerModel(
    input_chunk_length=30,
    output_chunk_length=15,
    batch_size=32,
    n_epochs=200,
    # model_name="air_transformer",
    nr_epochs_val_period=10,
    d_model=16,
    nhead=8,
    num_encoder_layers=2,
    num_decoder_layers=2,
    dim_feedforward=128,
    dropout=0.1,
    optimizer_kwargs={"lr": 1e-2},
    activation="relu",
    random_state=42,
    # save_checkpoints=True,
    # force_reset=True,
)
a5938a96661901794d364354e9206263.png

2.3.3 HT(2019)

HT (Hierarchical Transformer) is a time series prediction algorithm based on the Transformer model, proposed by researchers at the Chinese University of Hong Kong. The HT model adopts a hierarchical structure to process time-series data with multiple time scales, and captures the features of different time scales through an adaptive attention mechanism to improve the predictive performance and generalization ability of the model.

The HT model consists of two main components: a multi-scale attention module and a prediction module. In the multi-scale attention module, the HT model captures the features of different time scales through an adaptive multi-head attention mechanism, and fuses the features of different time scales into a common feature representation. In the prediction module, the HT model uses the fully connected layer to predict the feature representation and output the final prediction result.

The advantage of the HT model is that it can adaptively process time-series data with multiple time scales, and capture the features of different time scales through an adaptive multi-head attention mechanism to improve the prediction performance and generalization ability of the model. In addition, the HT model also has good interpretability and generalization capabilities, and can be applied to a variety of time series prediction tasks.

2.3.4 LogTrans(2019)

Paper:Enhancing the Locality and Breaking the Memory Bottleneck of Transformer on Time Series Forecasting

Code:Autoformer

LogTrans proposes an improved method for Transformer time series prediction, including convolutional self-attention (generating queries and keys with causal convolution, incorporating local environment into the attention mechanism) and LogSparse Transformer (a variant of Transformer with higher memory efficiency, used to reduce the memory cost of long-term series modeling), which is mainly used to solve the two main weaknesses of Transformer time series prediction, which are independent of location, attention and memory bottlenecks.

2.3.5 DeepTTF(2020)

DeepTTF (Deep Temporal Transformational Factorization) is a time series prediction algorithm based on deep learning and matrix decomposition, proposed by researchers at the University of California, Los Angeles. The DeepTTF model decomposes a time series into multiple time periods and models each time period using matrix factorization techniques to improve the predictive performance and interpretability of the model.

The DeepTTF model consists of three main components: time segmentation, matrix factorization, and predictor. In the time segmentation stage, the DeepTTF model divides the time series into multiple time segments, each of which contains a continuous period of time. In the matrix decomposition stage, the DeepTTF model decomposes each time period into two low-dimensional matrices, which represent the relationship between time and features, respectively. In the predictor stage, the DeepTTF model uses a multi-layer perceptron to make predictions for each time period, and combines the prediction results into a final prediction sequence.

The advantage of the DeepTTF model is that it can effectively capture local patterns and global trends in time series while maintaining high forecasting accuracy and interpretability. In addition, the DeepTTF model also supports cross-validation based on time segments to improve the robustness and generalization ability of the model.

2.3.6 PTST(2020)

Probabilistic Time Series Transformer (PTST) is a time series prediction algorithm based on the Transformer model, which was proposed by Google Brain in 2020. The algorithm uses a probabilistic graphical model to improve the accuracy and reliability of time series forecasting, and can achieve better performance in time series data with greater uncertainty.

The PTST model mainly consists of two parts: sequence model and probability model. The sequence model adopts the Transformer structure, which can encode and decode time series data, and use the self-attention mechanism to pay attention to and extract important information in the sequence. The probabilistic model introduces variational autoencoder (VAE) and Kalman filter (KF) to capture uncertainty and noise in time series data.

Specifically, the sequence model of the PTST model uses the Transformer Encoder-Decoder structure for timing prediction. The Encoder part uses a multi-layer self-attention mechanism to extract the features of the input sequence, and the Decoder part gradually generates the output sequence through autoregressive methods. On this basis, the probability model introduces a random variable, the noise term of the series data, which is modeled as a normal distribution. At the same time, in order to reduce potential errors, the probability model also uses KF to smooth the sequence.

During training, PTST employs the Maximum A Posteriori Probability (MAP) estimation method to maximize the predicted probability. In the prediction phase, PTST utilizes Monte Carlo sampling to sample from the posterior distribution to generate a set of probability distributions. At the same time, in order to measure the accuracy of prediction, PTST also introduces loss functions such as mean square error and negative log likelihood (NLL).

2.3.7 Reformer(2020)

Paper:Reformer: The Efficient Transformer

Reformer is a neural network structure based on Transformer model, which has certain application prospects in time series prediction tasks. Time series forecasting can be performed using the Reformer model for sampling, autoregressive, multi-step forecasting, and reinforcement learning. In these methods, values ​​for future time steps are generated by feeding known historical time steps into the model. The Reformer model makes the model more efficient, accurate and scalable by introducing techniques such as separable convolution and reversible layers. In short, the Reformer model provides a new idea and method for time series prediction tasks.

2.3.8 Informer(2020)

Paper:Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

Code: https://github.com/zhouhaoyi/Informer2020

Informer is a time series prediction method based on the Transformer model, which was proposed by Peking University's Deep Learning and Computational Intelligence Laboratory in 2020. Different from the traditional Transformer model, Informer introduces a new structure and mechanism based on the Transformer model to better adapt to the temporal prediction task. The core ideas of the Informer method include:

  • Long short-term memory (LSTM) encoder-decoder structure: Informer introduces an LSTM encoder-decoder structure, which can alleviate the long-term dependency problem in time series to a certain extent.

  • Adaptive length attention (AL) mechanism: Informer proposes an adaptive length attention mechanism, which can adaptively capture important information in sequences at different time scales.

  • Multi-scale convolution kernel (MSCK) mechanism: Informer uses a multi-scale convolution kernel mechanism that can simultaneously consider features on different time scales.

  • Generative confrontation network (GAN) framework: Informer uses the GAN framework, which can further improve the prediction accuracy of the model through confrontation learning.

In the training phase, the Informer method can use a variety of loss functions (such as mean absolute error, mean square error, L1-Loss, etc.) to train the model, and use the Adam optimization algorithm to update the model parameters. In the prediction phase, the Informer method can use a sliding window technique to predict values ​​at future time points.

The Informer method is experimented on multiple time series forecasting datasets and compared with other popular time series forecasting methods. Experimental results show that the Informer method shows good performance in terms of prediction accuracy, training speed, and computational efficiency.

2.3.9 TAT(2021)

TAT (Temporal Attention Transformer) is a time series prediction algorithm based on the Transformer model, proposed by the Intelligence Science Laboratory of Peking University. The TAT model adds a time attention mechanism to the traditional Transformer model, which can better capture the dynamic changes in the time series.

The basic structure of the TAT model is similar to Transformer, including multiple Encoder and Decoder layers. Each Encoder layer includes a multi-head self-attention mechanism and a feed-forward network to extract features from the input sequence. Each Decoder layer includes a multi-head self-attention mechanism, a multi-head attention mechanism, and a feed-forward network for gradually generating output sequences. Different from the traditional Transformer model, the TAT model introduces a temporal attention mechanism in the multi-head attention mechanism to capture the dynamic changes in time series. Specifically, the TAT model takes the time step information as an additional feature input, and then uses the multi-head attention mechanism to focus on and extract the time steps to assist the model in modeling the dynamic changes in the sequence. In addition, the TAT model also uses incremental training technology to improve the training efficiency and prediction performance of the model.

2.3.10 NHT(2021)

Paper:Nested Hierarchical Transformer: Towards Accurate, Data-Efficient and Interpretable Visual Understanding

NHT (Nested Hierarchical Transformer) is a deep learning algorithm for time series forecasting. It uses a nested hierarchical transformer structure to achieve accurate prediction of time series data through a multi-level nested self-attention mechanism and temporal importance evaluation mechanism. The NHT model improves the traditional self-attention mechanism by introducing more layers, while using the temporal importance evaluation mechanism to dynamically control the importance of different layers for better prediction performance. The algorithm has shown excellent performance in multiple time series forecasting tasks, proving its potential in the field of time series forecasting.

2.3.11 Autoformer(2021)

Paper:Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series 

ForecastingCode:https://github.com/thuml/Autoformer

AutoFormer is a time series prediction model based on Transformer structure. Compared with traditional RNN, LSTM and other models, AutoFormer has the following characteristics:

  • Self-attention mechanism: AutoFormer adopts self-attention mechanism, which can capture the global and local relationships of time series at the same time, avoiding the problem of gradient disappearance during long sequence training.

  • Transformer structure: AutoFormer uses the Transformer structure, which can realize parallel computing and improve training efficiency.

  • Multi-task learning: AutoFormer also supports multi-task learning, which can predict multiple time series at the same time, improving the efficiency and accuracy of the model.

The specific structure of the AutoFormer model is similar to Transformer, including two parts: encoder and decoder. The encoder consists of multiple self-attention layers and feed-forward neural network layers to extract features from the input sequence. The decoder also consists of multiple self-attention layers and feed-forward neural network layers to convert the output of the encoder into a sequence of predictions. In addition, AutoFormer also introduces an attention mechanism across time steps, which can adaptively choose the time step size in the encoder and decoder. Overall, AutoFormer is an efficient and accurate time series forecasting model suitable for many types of time series forecasting tasks.

61d8a6fc4b2679aefc1dfae2f7705432.png

2.3.12 Pyraformer(2022)

Paper:Pyraformer: Low-complexity Pyramidal Attention for Long-range Time Series Modeling and ForecastingCode: https://github.com/ant-research/Pyraformer

Ant Research Institute proposes a new pyramid attention-based Transformer (Pyraformer) to bridge the gap between capturing long-distance dependencies and achieving low temporal and spatial complexity. Specifically, a pyramidal attention mechanism is developed by passing attention-based information in the pyramidal graph, as shown in Figure (d). The edges in this graph can be divided into two groups: inter-scale connections and intra-scale connections. Connections between scales build a multiresolution representation of the original series: nodes at the finest scales correspond to time points in the original time series (e.g., hourly observations), while nodes at coarser scales represent lower-resolution features (e.g., daily, weekly, and monthly patterns).

Such latent coarse-scale nodes are initially introduced through coarse-scale construction modules. Intra-scale edges, on the other hand, capture the temporal correlation at each resolution by connecting neighboring nodes together. Thus, the model provides a compact representation of long-term temporal dependencies between distant locations by capturing such behavior at a coarser resolution, resulting in shorter lengths of paths traversed by signals. Furthermore, modeling the temporal dependencies of different ranges at different scales through sparse adjacent-scale intra-scale connections can significantly reduce the computational cost.

0c802d0e8c8cc02bce3dd4700695b0dd.png 45900f61ea993b630c962ff71a8beb6a.png

2.3.13 BOLD forms(2022)

Paper:FEDformer: Frequency Enhanced Decomposed Transformer for Long-term Series 

ForecastingCode: https://github.com/MAZiqing/FEDformer

FEDformer is a neural network structure based on the Transformer model, which is specially used for distributed time series prediction tasks. The model divides the time series data into multiple small blocks and accelerates the training process through distributed computing. FEDformer introduces a local attention mechanism and a reversible attention mechanism, so that the model can better capture the local features in time series data and has higher computational efficiency. In addition, FEDformer also supports functions such as dynamic partitioning, asynchronous training, and adaptive partitioning, making the model more flexible and scalable.

018b417b17d6e6a7af98bd541e701f18.png

2.3.14 Crossformer(2023)

Paper:Crossformer: Transformer Utilizing Cross-Dimension Dependency for Multivariate Time Series 

ForecastingCode: https://github.com/Thinklab-SJTU/Crossformer

Crossformer proposes a new hierarchical Encoder-Decoder architecture, as shown below, consisting of the left Encoder (gray) and the right Decoder (light orange), including Dimension-Segment-Wise (DSW) embedding, Two-Stage Attention (TSA) layer and Linear Projection.

b2a1a24f44214f4f481eb67a57f5d577.jpeg

2.4 Mix category

Integrating algorithms such as ETS, autoregressive, RNN, CNN, and Attention can take advantage of their respective advantages to improve the accuracy and stability of time series prediction. This combined approach is often referred to as a "hybrid model". Among them, RNN can automatically learn long-term dependencies in time series data; CNN can automatically extract local features and spatial features in time series data; the Attention mechanism can adaptively focus on important parts of time series data. By fusing these algorithms, the time series prediction model can be made more robust and accurate. In practical applications, according to different time series prediction scenarios, an appropriate algorithm fusion method can be selected, and the model can be debugged and optimized.

2.4.1 Encoder-Decoder CNN(2017)

Paper:Deep Learning for Precipitation Nowcasting: A Benchmark and A New Model

Encoder-Decoder CNN is also a model that can be used for timing prediction tasks. It is a convolutional neural network that combines encoders and decoders. In this model, an encoder is used to extract features of a time series, while a decoder is used to generate future time series.

Specifically, the Encoder-Decoder CNN model can perform timing prediction according to the following steps:

  • Input the historical time series data, and extract the features of the time series through the convolutional layer.

  • The feature sequence output by the convolutional layer is sent to the encoder, and the feature dimension is gradually reduced through the pooling operation, and the state vector of the encoder is saved.

  • The state vector of the encoder is fed into the decoder, and future time series data are gradually generated through deconvolution and upsampling operations.

  • Perform post-processing on the output of the decoder, such as de-meaning or normalization, to get the final prediction result.

It should be noted that the Encoder-Decoder CNN model needs to use an appropriate loss function (such as mean square error or cross entropy) during training, and adjust hyperparameters as needed. In addition, in order to improve the generalization ability of the model, it is also necessary to use techniques such as cross-validation for model evaluation and selection.

2.4.2 LSTNet (2018)

Paper:Modeling Long- and Short-Term Temporal Patterns with Deep Neural Networks

LSTNet is a deep learning model for time series prediction, its full name is Long- and Short-term Time-series Networks. LSTNet combines long-short-term memory network (LSTM) and one-dimensional convolutional neural network (1D-CNN), which can effectively process long-term and short-term time series information, while also capturing seasonal and periodic changes in the sequence. LSTNet was originally proposed by Guokun Lai et al. from the Institute of Computing Technology, Chinese Academy of Sciences in 2018.

The core idea of ​​the LSTNet model is to use CNN to extract features from time series data, and then input the extracted features into LSTM for sequence modeling. LSTNet also includes an adaptive weight learning mechanism, which can effectively balance the importance of long-term and short-term time series information. The input of the LSTNet model is a time series matrix of shape (T, d), where T represents the number of time steps and d represents the feature dimension of each time step. The output of LSTNet is a prediction vector of length H, where H represents the number of time steps predicted. During the training process, LSTNet adopts the mean square error (MSE) as the loss function and uses the backpropagation algorithm for optimization.

1de2aaaf0b8f4cbfbe96a716eb9f3e07.jpeg

2.4.3 TDAN(2018)

Paper:TDAN: Temporal Difference Attention Network for Precipitation Nowcasting

TDAN (Time-aware Deep Attentive Network) is a deep learning algorithm for time series prediction, which captures the time series characteristics of time series by fusing convolutional neural network and attention mechanism. Compared with the traditional convolutional neural network, TDAN can more effectively utilize the time information in time series data, thereby improving the accuracy of time series prediction.

Specifically, the TDAN algorithm can perform timing prediction according to the following steps:

  • Input the historical time series data, and extract the features of the time series through the convolutional layer.

  • The feature sequence output by the convolutional layer is sent to the attention mechanism, and the weighted feature vector is calculated according to the weights related to the current prediction in the historical data.

  • The weighted feature vector is sent to the fully connected layer for final prediction.

It should be noted that the TDAN algorithm needs to use an appropriate loss function (such as mean square error) during the training process, and adjust hyperparameters as needed. In addition, in order to improve the generalization ability of the model, it is also necessary to use techniques such as cross-validation for model evaluation and selection.

The advantage of the TDAN algorithm is that it can adaptively focus on the part of the historical data that is relevant to the current forecast, thereby improving the accuracy of the time series forecast. At the same time, it can also effectively deal with problems such as missing values ​​and outliers in time series data, and has certain robustness.

2.4.4 DeepAR(2019)

Paper:DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks

DeepAR is an autoregressive recurrent neural network that uses a recurrent neural network (RNN) combined with autoregressive AR to forecast scalar (one-dimensional) time series. In many applications, there will be multiple similar time series across a representative set of cells. DeepAR will combine multiple similar time series, such as sales data of different instant noodle flavors, learn the internal correlation characteristics of different time series through deep recurrent neural network, and use multiple or multiple target numbers to improve the overall prediction accuracy. DeepAR finally generates a multi-step forecast result with an optional time span. The forecast at a single time node is a probabilistic forecast. By default, three values ​​​​of P10, P50 and P90 are output. P10 here refers to the probability distribution, that is, the 10% probability will be less than the value of P10. By giving probabilistic forecasts, we can either combine three values ​​to give one value forecast, or use the interval of P10 – P90 to make corresponding decisions.

2.4.5 N-BEATS(2020)

Paper:N-BEATS: Neural basis expansion analysis for interpretable time series 

ForecastingCode: https://github.com/amitesh863/nbeats_forecast

N-BEATS (Neural basis expansion analysis for interpretable time series forecasting) is a neural network-based time series forecasting model developed by Oriol Vinyals et al. in the Google Brain team. N-BEATS uses a learned basis function to represent time series data, which can improve the interpretability of the model while maintaining high accuracy. The N-BEATS model also employs stacked regression modules and deconvolution modules, which can effectively handle multi-scale time series data and long-term dependencies.

3761739bf01a10f45e09c7e52b603739.png
 
  
model = NBEATSModel(
    input_chunk_length=30,
    output_chunk_length=15,
    n_epochs=100,
    num_stacks=30,
    num_blocks=1,
    num_layers=4,
    dropout=0.0,
    activation='ReLU'
)
498e7447ac40c12aa1ca370ae3669ddd.png

2.4.6 TCN-LSTM(2021)

Paper:A Comparative Study of Detecting Anomalies in Time Series Data Using LSTM and TCN Models

TCN-LSTM is a model that combines Temporal Convolutional Network (TCN) and Long Short-Term Memory (LSTM), which can be used for time series prediction tasks. In this model, the TCN layer and LSTM layer cooperate with each other to capture the characteristics of long-term and short-term time series, respectively. Specifically, the TCN layer can be implemented by stacking multiple convolutional layers to expand the receptive field while preventing gradient disappearance through residual connections. The LSTM layer can capture the long-term dependencies of the time series through the memory unit and the gating mechanism.

The TCN-LSTM model can perform time series prediction according to the following steps:

  • Input the historical time series data, and extract the short-term features of the time series through the TCN layer.

  • The feature sequence output by the TCN layer is sent to the LSTM layer to capture the long-term dependencies of the time series.

  • The feature vector output by the LSTM layer is sent to the fully connected layer for final prediction.

It should be noted that the TCN-LSTM model needs to use an appropriate loss function (such as mean square error) during training, and adjust hyperparameters as needed. In addition, in order to improve the generalization ability of the model, it is also necessary to use techniques such as cross-validation for model evaluation and selection.

2.4.7 NeuralProphet(2021)

Paper:Neural Forecasting at Scale

NeuralProphet is a neural network-based time series prediction framework provided by Facebook. It adds some neural network structures to the Prophet framework to more accurately predict time series data with complex nonlinear trends and seasonality.

  • The core idea of ​​NeuralProphet is to use deep neural network to learn the nonlinear characteristics of time series, and combine Prophet's decomposition model with neural network. NeuralProphet provides a variety of neural network structures and optimization algorithms, which can be selected and adjusted according to specific application requirements. The characteristics of NeuralProphet are as follows:

  • Flexibility: NeuralProphet can handle time series data with complex trends and seasonality, and can flexibly set the neural network structure and optimization algorithm.

  • Accuracy: NeuralProphet can take advantage of the nonlinear modeling capabilities of neural networks to improve the accuracy of time series forecasting.

  • Interpretability: NeuralProphet can provide rich visualization tools to help users understand prediction results and influencing factors.

  • Ease of use: NeuralProphet can be easily integrated with programming languages ​​such as Python, and provides a wealth of APIs and examples so that users can get started quickly.

NeuralProphet has a wide range of applications in many fields, such as finance, transportation, electricity, etc. It helps users predict future trends and changes in trends, and provides useful reference and decision support.

2.4.8 N-HiTS(2022)

Paper:N-HiTS: Neural Hierarchical Interpolation for Time Series Forecasting

N-HiTS (Neural network-based Hierarchical Time Series) is a neural network-based hierarchical time series prediction model developed by the Uber team. N-HiTS uses deep learning based methods to forecast multi-level time series data such as product sales, traffic, stock prices, etc. The model adopts a hierarchical structure, which decomposes the entire time series data into multiple levels, each level contains different time granularity and characteristics, and then uses the neural network model to make predictions. N-HiTS also employs an adaptive learning algorithm that can dynamically adjust the structure and parameters of the forecasting model to maximize forecasting accuracy.

9c36beccae1aa10b853771d63d2507ae.png
 
  
model = NHiTSModel(
    input_chunk_length=30,
    output_chunk_length=15,
    n_epochs=100,
    num_stacks=3,
    num_blocks=1,
    num_layers=2,
    dropout=0.1,
    activation='ReLU'
)
1f229486f13cbc01de00798a90fd78a3.png

2.4.9 D-Linear(2022)

Paper:Are Transformers Effective for Time Series Forecasting?

Code: https://github.com/cure-lab/LTSF-Linear

D-Linear (Deep Linear Model) is a neural network-based linear time series prediction model developed by Li Hongyi's team. D-Linear uses a neural network structure for linear forecasting of time series data, which can improve the interpretability of the model while maintaining high forecasting accuracy. The model uses a multilayer perceptron (Multilayer Perceptron) as a neural network model, and improves the performance of the model through alternate training and fine-tuning. D-Linear also provides a feature selection method based on sparse coding, which can automatically select features with discriminative and predictive power. Similar to it, N-Linear (Neural Linear Model) is a linear time series prediction model based on neural network, developed by the Baidu team.

 
  
model = DLinearModel(
    input_chunk_length=15,
    output_chunk_length=13,
    batch_size=90,
    n_epochs=100,
    shared_weights=False,
    kernel_size=25,
    random_state=42
)
model = NLinearModel(
    input_chunk_length=15,
    output_chunk_length=13,
    batch_size=90,
    n_epochs=100,
    shared_weights=True,
    random_state=42
)
932c5cf3dc4860e390b19c6d43e0a926.png 1fdf5d85129ac9693ed9eb43b91b081c.png

Editor: Huang Jiyan

e276ca978c0c9e37c08e459bb29820b2.png

Guess you like

Origin blog.csdn.net/tMb8Z9Vdm66wH68VX1/article/details/131907476