A Survey of Deep Learning Time Series

c40050d3af36e84224a072dab73a1097.png

来源:算法进阶
本文约4300字,建议阅读8分钟本文着重介绍对比了多个基于 Transformer 模型的时间序列预测方法。

e7324a8736fee226096f199eb70a33fa.jpeg

Abstract: Time series generally refers to a group of random variables obtained by observing the development and change process of something and collecting it according to a certain frequency. The task of time series forecasting is to dig out the core laws contained in a large number of data and make accurate estimates of future data based on known factors. Due to the access of a large number of IoT data acquisition devices, the explosive growth of multi-dimensional data, and the increasingly stringent requirements for prediction accuracy, it is difficult for classic parametric models and traditional machine learning algorithms to meet the high efficiency and high precision requirements of prediction tasks. In recent years, deep learning algorithms represented by convolutional neural network, recurrent neural network and Transformer model have achieved fruitful results in time series forecasting tasks. In order to further promote the development of time series forecasting technology, the common characteristics of time series data, evaluation indicators of data sets and models are reviewed, and time and algorithm architecture are taken as the main research line, and the characteristics, advantages and limitations of each forecasting algorithm are compared and analyzed experimentally ; Emphatically introduce and compare multiple time series forecasting methods based on Transformer model; Finally, combined with the problems and challenges of deep learning applied to time series forecasting tasks, the future research trends in this direction are prospected. (Download address of the paper is attached at the end of the article)

1 Introduction

With the widespread access of IoT sensors in society, almost all fields of science are generating massive amounts of time-series data at an immeasurable rate. Traditional parametric models and machine learning algorithms have been difficult to efficiently and accurately process time series data, so the use of deep learning algorithms to mine useful information from time series has become the focus of many scholars. Classification and clustering [1-4], anomaly detection [5-7], event prediction [8-10], and time series prediction [11-14] are the four key research directions of time series data. The existing time series forecasting review articles summarize the classic parametric models and the relevant content of traditional machine learning algorithms, but lack the introduction of the latest achievements of Transformer algorithms and the comparative analysis of experiments on commonly used data sets in various industries. The rest of the content will focus on analyzing and explaining the content of time series forecasting direction from the perspective of deep learning, and conduct experimental comparative analysis on different data sets using multiple evaluation indicators in various GPU environments. The development of time series forecasting algorithm based on deep learning is shown in Figure 1:

03ae657e62634fbad406613184d0d68a.png

Time series prediction is the most common and important application in time series tasks. By mining the potential laws of time series, analogy or extension can be used to solve many problems faced in real life, including noise elimination [15], stock market analysis [16-17], electric load forecasting [18], traffic condition forecasting [19-20], flu epidemic early warning [21], etc.

When the original data provided by the time series forecasting task is only the historical data of the target data, it is a univariate time series forecast, and when the provided original data contains a variety of random variables, it is a multivariate time series forecast. Time series forecasting tasks can be divided into four categories according to the predicted time span, as shown in Figure 2:

061299d0b805f31bce53b65de26ff09a.png

The rest of the article mainly introduces the research on time series forecasting algorithms based on deep learning. The second section introduces the characteristics of time series data, the third section introduces the common data sets and evaluation indicators of time series forecasting tasks, and the fourth section introduces the deep learning Research progress and application in the field of time series forecasting, the fifth section looks forward to the future research direction of deep learning in the field of time series forecasting.

2 Characteristics of time series data 

Time series forecasting is to learn and analyze the historical data of the previous t -1 moment to estimate the data value of the specified future time period. Due to the inherent potential relationship between variables, time series data often exhibit one or more characteristics. In order to have a more comprehensive understanding of time series forecasting, this section will introduce these common characteristics in detail.

(1) Massiveness: With the upgrade of IoT sensor equipment, the increase of measurement frequency and measurement dimension, the explosive growth of time series data, high-dimensional time series data occupy the mainstream [22]. Effective preprocessing at the data set level is the key to high-quality completion of time series forecasting tasks.

(2) Trend: The data at the current moment is often closely related to the data at the previous period. This feature implies that the time series data usually has a certain change rule due to the influence of other factors, and the time series may show a stable state for a long time. A tendency to rise or fall steadily or remain level.

(3) Periodicity: The data in the time series are affected by external factors, showing alternating changes of ups and downs over a long period of time [23]. Smooth motion in an approximately straight line in a certain direction.

(4) Volatility: With the passage of time and the influence of multiple external factors, the variance and mean of the time series may also change systematically, which affects the accuracy of time series forecasting to a certain extent.

(5) Stationarity: Individual time series data fluctuate randomly, showing statistical laws at different times, and relatively stable in variance and mean.

(6) Symmetry: If within a certain period of time, the distance between the original time series and its reversed time series is controlled within a certain threshold, and the curves are basically aligned, then the time series is considered to be symmetric [24], for example Reciprocating operation of large transport vehicles in ports, lifting and lowering arms of cranes, etc.

7094760a8cc1ef70eb151cd37a129ec6.png

3. Time series forecasting method based on deep learning

The time series forecasting method based on deep learning initially predicts that the amount of task data is small, and the training speed of the shallow neural network is fast. However, with the increase in the amount of data and the continuous improvement of accuracy requirements, the shallow neural network is far from meeting the task requirements. In recent years, deep learning has attracted extensive attention from researchers in various fields. Compared with traditional algorithms, deep learning methods have shown stronger performance in time series forecasting tasks, and have been developed and widely used in the long run. Compared with the shallow neural network, the deep neural network has better linear and nonlinear feature extraction capabilities, and can dig out the laws that the shallow neural network is easy to ignore, and finally meet the high-precision prediction task requirements [30]. The remainder of this section describes three broad categories of deep learning models that can be used to solve time series forecasting problems.

3.1 Convolutional Neural Network 

3.1.1 Convolutional Neural Network

Convolutional Neural Networks (CNN) is a kind of deep feed-forward neural network with convolution and pooling operations as the core. At the beginning of its design, it was used to solve the problem of image recognition in the field of computer vision [31-32 ]. The principle of convolutional neural network for time series prediction is to use the ability of the convolution kernel to feel the changes in historical data for a period of time, and make predictions based on the changes in this historical data. The pooling operation can retain key information and reduce the redundancy of information. The convolutional neural network can effectively reduce the consumption of human resources for feature extraction by previous algorithms, while avoiding the generation of human errors. The amount of sample input required by the convolutional neural network is huge, and it is mostly used to predict data sets with spatial characteristics. Its network structure generally has five layers. The specific structure is shown in Figure 4:

01dc9f64fe4129373fae9cb814ba818f.png

1f4ac355de8ba2c45a5954525ad38c1b.png

It can be seen from Table 1 that when the model handles short-term prediction tasks on a multivariate data set with a large sample size, Kmeans-CNN uses the idea of ​​clustering and classification first and then training the model to achieve a relatively ideal prediction effect, and there are many follow-up Researchers do something similar when solving time-series forecasting problems. TCN, which introduces architectural elements such as expanded convolution and residual connection, can retain longer effective historical information, and also achieves good prediction results, and its network is relatively simple and clear. At present, the prediction accuracy of CNNs is not superior to that of other network structures such as cyclic neural networks, and it is difficult to deal with the time series prediction problem with a long step size alone, but it is often used as a powerful module connected to other advanced algorithm models for prediction task.

3.2 Recurrent Neural Network

The RNNs cyclic neural network algorithm has been an important method to solve time series forecasting tasks since it was proposed. It is often embedded as a module in other algorithms to obtain better forecasting results. It has been used as a solution to the time series data forecasting problem before 2017. The main model is widely used. The experimental performance comparison and overall analysis of the main cyclic neural network algorithms are shown in Table 3 and Table 4:

3b7f9e88524d88ccb38a78fc2d8c73c6.png

2d52830e6ddf2d769909a339bd7f38a6.png

It can be seen from Table 3 that GRU and LSTM are comparable in performance, but both are limited to learning and training from one direction, and the prediction accuracy is lower than Bi-LSTM model that can obtain information from two directions. The advantages of Bi-LSTM in solving short-term time series prediction tasks include the small number of samples required, fast fitting speed, and high prediction accuracy. It is still used by many scholars today. The cyclic neural network method can capture and use long-term and short-term time dependencies for prediction, but it does not perform well in long-sequence time series prediction tasks, and RNNS is mostly serial computing, resulting in extremely high memory consumption during training. Big, and the problem of gradient disappearance and gradient explosion has not been completely solved.

3.3 Transformer class model 

Before introducing the Transformer model, we must first introduce the attention mechanism. The human eye has a wide perspective, but it is limited to visual resources, and often focuses on specific parts of the line of sight. The attention mechanism is inspired by this, and focuses on more information in the data part of the value [48-49]. The self-attention mechanism adopted by the Transformer solves the situation that the input of the neural network is a lot of vectors of different sizes, and the vectors at different times often have some potential connection, and the actual training cannot fully capture the potential between the inputs. Links lead to poor model training results.

a2827b903fe0643589a52d77d277e780.png

Transformer-like algorithms are now widely used in various tasks in the field of artificial intelligence. Building a model based on Transformer can break the bottleneck of previous algorithms. It can also have a good ability to capture short-term and long-term dependencies, effectively solve long-term sequence prediction problems, and Can be processed in parallel. The performance comparison and overall analysis of the above algorithms are shown in Table 6 and Table 7:

0afb4c9e85f5a160da7c24de085421f8.png

72feac260d2859218ee6a6f555fcdb54.png

178796a9c52f37b0a1b7c9301749bfb6.png

c0e60db810d7c379935e9499585ad47b.png

It can be seen from Table 6 that the Transformer class algorithm needs a large amount of data for its own training in order to avoid overfitting, and has good performance in medium-term and long-term prediction tasks. At present, some Transformer-like algorithms have begun to re-examine the role of the attention mechanism while retaining the encoder-decoder architecture, because the self-attention mechanism may be unreliable in complex long-sequence prediction tasks. Informer and others chose to sacrifice part of the effective information while reducing the complexity. Conformer uses local attention to complement the global GRU. Pyraformer still shows good performance under a relatively low configuration, which alleviates the problem of high equipment requirements for Transformer algorithms to a certain extent, and is suitable for popular use in underdeveloped areas.

3.4 Summary

After a brief introduction to time series data, classic time series parameter models and algorithm evaluation indicators, the article systematically summarizes time series forecasting algorithms based on deep learning, among which Transformer-based models are the main ones, and deeply analyzes the network architecture of Transformer-like algorithms Advantages and disadvantages, since the attention mechanism was proposed, the development of time series prediction tasks has entered the fast lane and achieved remarkable results. The key issues and further research directions in the field of time series forecasting are listed below to promote the research and improvement of time series forecasting algorithms.

(1) The stochastic nature-inspired optimization algorithm is used to optimize multiple hyperparameters of the deep learning model. Deep learning algorithms are becoming more and more complex, and there are more and more hyperparameters that need to be processed. The choice of hyperparameters often determines whether the algorithm can break through the local optimum trap and achieve global optimum. Stochastic nature-inspired optimization algorithms are inspired by various phenomena of swarm intelligence, the natural behavior of animals, the laws of physics, and the laws of evolution. The optimization algorithm first randomly generates a certain number of solvable solutions based on the constraints of the problem, and then uses each stage of the algorithm to repeatedly find the global optimal solution, and find the optimal hyperparameters within the limited range to improve the prediction ability of the model. Therefore, using the stochastic nature-inspired optimization algorithm to find the optimal hyperparameters of the model will become one of the hotspots in future research.

(2) Study network architectures suitable for small datasets with irregular time intervals. The existing Transformer model has a complex structure and many parameters, and it shows superior performance on a periodic data set, but it does not perform well in a data set with a small amount of data and irregular time intervals. The Transformer class model deserves further consideration and solution for the overfitting problem on small data. When dealing with data sets with irregular time intervals, introducing resampling, interpolation, filtering, or other methods into the model architecture is a new idea for processing time series data and task characteristics, and will be a new research direction in the future.

(3) Introduce graph neural network (graph neural network, GNN) for multivariate time series prediction modeling. Accurate multivariate forecasting is challenging due to the complexity of latent variable correlations in multivariate time-series forecasting tasks and the variability of data correlations in the real world. Recently, many scholars have used temporal polynomial graph neural networks to represent dynamic variable correlations as dynamic matrix polynomials, which can better understand spatiotemporal dynamics and potential contingencies, and have reached advanced levels in both short-term and long-term multivariate time series prediction. Therefore, the powerful modeling ability of GNN for multivariate time series forecasting deserves further study.

(4) Study differentiable loss functions that support both precise shape and temporal dynamics as evaluation metrics. Many measurement metrics have been used in the field of time series forecasting, and point error loss functions based on Euclidean distance, such as MSE, are widely used to process time series data, but their point-wise mapping is not sensitive to shape and time delay distortions. is immutable. The loss function should not only minimize the gap between the forecast and the target time series but also consider the correlation between the entire output series and the ground truth, thus helping the model to generate more timely, robust and accurate forecasts instead of just point by point Optimize the model. If the loss function can evaluate the model in terms of curve shape and time perception, it will be more conducive to training an efficient and accurate time series prediction model.

Paper address:

http://fcst.ceaj.org/CN/10.3778/j.issn.1673-9418.2211108

编辑:于腾凯
校对:刘光栋

Guess you like

Origin blog.csdn.net/tMb8Z9Vdm66wH68VX1/article/details/131355927