A TimeSeries Analysis of Automotive Data Streams An In

Author: Zen and the Art of Computer Programming

1 Introduction

With the continuous development of information technology, the booming of the automation industry, and the generation of massive data sets, all walks of life are facing many challenges from big data. From data collection to storage to analysis to decision-making, people need to develop new technical means to process these massive data quickly and accurately. Among them, Time Series Analysis plays an important role in electronic vehicles, network traffic, sensor data, etc. The main task of time series data analysis is to analyze time series data, extract valuable information and patterns, and apply them to business decisions. Due to the exponential growth of data, time series data analysis technology has also developed rapidly.

Time series data analysis is an independent discipline that involves multiple fields, such as mathematics, statistics, computer science, database systems, information theory, etc. This article will introduce a new era of time series data analysis technology led by Microsoft Research Asia - a time series prediction method based on machine learning, called Autoformer. Autoformer is a scalable and efficient time series prediction model that can handle high-dimensional and high-latitude time series data at the same time. It is characterized by lightweight, ease of use, small model size, fast calculation speed, and high prediction accuracy.

Autoformer draws on the DeepAR model self-developed by Google, Facebook and other companies, and uses the Transformer Residual Network to learn long-term dependencies. This structure enables efficient training while maintaining computational complexity. Its overall architecture is shown in the figure below:

  1. Autoformer model

    (1) Basic knowledge of time series prediction

    First, we need to understand what time series forecasting is. Given a set of observations (sequence) as input, our goal is to predict the next most likely value given future observations. Generally, time series prediction can be divided into two categories: supervised and unsupervised.

1.1 Supervision method of time series prediction

When we have a large amount of labeled data, we can use supervised methods to train the model. Supervision methods can be divided into two categories: regression prediction and classification prediction.

1.1.1 Regression prediction Regression prediction means that we know the mapping relationship between the input sequence and the output, and we can use this mapping relationship to predict the future output. For example, suppose we want to predict electricity bills. We have electricity usage data for many time periods as input, and electricity bills are the output. In this case, we can build a linear regression model to predict future data by fitting it to known data.

1.1.2 Classification prediction Another way is to predict events or states that belong to different categories. For example, we can predict whether the stock market will fall or rise. In this case, we can use classification models such as logistic regression or SVM for training.

1.2 Evaluation criteria for time series prediction

When evaluating the quality of time series prediction models, we usually use the following three criteria:

1.2.1 Mean Squared Error (MSE) MSE measures the degree of deviation of the model’s predicted value. A lower MSE means that the model is closer to the true value. However, MSE can only describe how the model performs and cannot judge whether the model has good generalization ability.

1.2.2 Mean Absolute Percentage Error (MAPE) MAPE measures the percentage error of the predicted value relative to the true value, which can more intuitively represent the prediction performance of the model. A lower MAPE means the prediction is more accurate. However, MAPE cannot be used directly to compare the advantages and disadvantages between different models.

**1.2.3 R-squared R-squared measures the correlation between the dependent variable and the independent variable. The larger the sum of R-squares, the greater the change in the independent variable that affects the dependent variable. However, the R-sum of squares can only describe the fitting degree of the model and cannot determine the validity of the model.

In summary, the basic method for evaluating time series prediction models is to use the difference between the prediction results on the test set and the actual situation. But what we need to pay attention to is that the data on the test set should be as close as possible to the data on the training set, so that more accurate evaluation results can be obtained. In addition, different evaluation criteria often apply to different problems. Therefore, before selecting the best model, we need to adopt different criteria according to the actual situation.

(2) DeepAR model

2.1 DeepAR Overview

Currently, the most advanced time series prediction models are generally based on neural networks (Neural Networks). However, deep learning technology is still in its infancy, and some basic technologies are not yet mature. For example, Dynamic Recurrent Neural Networks (DRNN) and Convolutional Recurrent Neural Networks (CRNN).

The DeepAR model is a time series prediction model based on DRNN. The basic idea is to use the cycle mechanism in the time dimension to capture historical information. Specifically, the DeepAR model treats the input sequence as dynamically generated, and the output at each moment only depends on some fixed-length historical inputs before that moment. Therefore, the DeepAR model does not require any preprocessing of the input sequence, and it can directly accept raw data.

The architecture of the DeepAR model is shown in the figure below:

2.2 Transformer-based Model

In order to be able to capture long-term dependencies in historical information, the DeepAR model adopts a Transformer structure. However, Transformer has too much overhead in parallel computing, and its parallel computing capabilities are not very outstanding compared with RNN. Therefore, the DeepAR model uses residual connection (Residual Connection) to reduce the cost of parallel computing.

The transformer layer of the DeepAR model follows the following structure:

Among them, Attention is a mechanism used to pay attention to which positions in the current time step input sequence are important features. The Self-Attention layer calculates input sequence features by learning and combining information within the sequence, while the Feed Forward layer is used to control the hidden layer. Residual connections are used to add the output of the previous layer to the input of the later layer to reduce the problem of vanishing gradients.

2.3 Autoformer

The Autoformer model is a time series prediction model based on the Transformer-based Model. Its main contribution is the introduction of the Transformer structure to simultaneously capture long-term dependencies in historical information. The main improvements of the Autoformer model are:

  1. Use AutoRegressive Structure: The Autoformer model only considers the output of the previous step at each time step. This structure can capture long-term dependencies in historical sequences.
  2. Use multi-scale encoding: Autoformer models are able to capture time series features at different scales.
  3. Fusion of features of different scales through split point coding: Through split point coding, the Autoformer model can better fuse time series features of different scales.

The overall structure of the Autoformer model is shown in the figure below:

(3)Autoformer model

3.1 Model training and hyperparameter settings

The Autoformer model implements time series prediction by predicting the conditional probability distribution of future time steps. The training of the model mainly relies on the Negative Log Likelihood Function (NLL). During the training process, the model needs to minimize the distance from the true value, that is, maximize the probability value of the likelihood function. Autoformer models achieve this by learning conditional probability distributions at each time step.

The training process of the Autoformer model consists of three steps:

  1. Data Preprocessing : Preprocessing of input data, including normalization and filling in missing values.
  2. Model Training : Training the model on the training set, mainly including building the model structure, defining the optimizer and training the model.
  3. Model Testing : Test the model on the test set and calculate the error between the predicted value and the true value.

In addition, the Autoformer model can also improve prediction accuracy by adjusting the parameters of the model structure. Hyperparameter settings for the Autoformer model include:

  1. Batch Size: The number of training samples in each batch.
  2. Sequence Length: The length of the input sequence.
  3. Number of Heads: The number of heads of the Attention mechanism.
  4. Size of each head: The size of each head of the Attention mechanism.
  5. Drop Out Rate: The ratio of DropOut.
  6. Learning Rate: The initial learning rate of Adam Optimizer.
  7. Epochs: Number of training rounds.

3.2 Model structure

The main structure of the Autoformer model is shown in the figure below:

3.2.1 Input Layer

The input layer receives input data, including observation sequences and features, and processes them.

3.2.2 Feature Extraction Layer

The feature extraction layer embeds features into the time series to obtain a feature vector.

3.2.3 Position Encoding Layer

The position encoding layer encodes the time step into a position vector, allowing the model to capture the characteristics of the input sequence at different positions.

3.2.4 Scaled Dot-Product Attention

The scaled dot product attention layer obtains important information in time series by learning aligned context vectors.

3.2.5 Multi-Head Attention

Multi-head attention layers allow the model to learn different types of time series features.

3.2.6 Concatenation and Output Layers

The connection layer stitches together the final outputs of multi-head attention and outputs the prediction value through a fully connected layer.

3.2.7 Prediction Distribution

The prediction distribution outputs a conditional probability distribution of the time step through the activation function, including the normal distribution, binomial distribution, and cross-entropy loss function.

3.3 Split point coding

In deep learning time series prediction tasks, time series features of different scales often have different data frequencies. For time series features with different scales, the Autoformer model can try to fuse features of different scales. Therefore, Split Point Coding is introduced in the Autoformer model to achieve the fusion of features of different scales.

Split point encoding can fuse features of different scales and improve the performance of the model. In particular, segmentation point coding can achieve the fusion of time series features at different scales. The Autoformer model achieves the fusion of features of different scales through segmentation point encoding.

3.3.1 Split Point Encoding Overview

Split point encoding can be thought of as a fixed-length encoding matrix, where the encoding vectors represent feature information at different time steps.

The specific steps of segmentation point encoding are as follows:

  1. Initialize split point coding: Randomly initialize a split point coding matrix.
  2. According to the model prediction results, the segmentation point encoding matrix is ​​updated.
  3. Coding matrices and time series data are stitched together to calculate forecasts.
3.3.2 Multi-Level Attention Mechanism

In order to capture time series features at different scales, the Autoformer model introduces a multi-level attention mechanism (Multi-level Attention Mechanism). Multi-level attention mechanisms can capture time series features at different scales.

3.3.3 Level Aggregation Mechanism

The segmentation point encoding adopts a cascade structure to achieve the fusion of features of different scales. Specifically, split point encoding first generates a global encoding vector, and then normalizes and smoothes each column in the encoding matrix to form encoding vectors of different scales. Finally, the model uses global encoding vectors and encoding vectors at different scales for prediction.

3.3.4 Sample Output

Here is a sample output to show the entire process of the Autoformer model:

Given an input sequence, the observation sequence is (t1, t2,..., tn), and the feature sequence is (x1, x2,..., xn). The output of the model at the t-th moment is expressed as π(xt|t) = Θ(θ)(x̂t-1, xt; ckt), k=1,...,K, ckt represents the coding matrix of the k-th level.

  1. The feature extraction layer embeds features into the time series to obtain a feature vector.
  2. The position encoding layer encodes the time step into a position vector to obtain ct1, ct2,..., ctk.
  3. The scaled dot product attention layer and the multi-head attention layer generate the encoding matrix ckt, where each column represents the corresponding feature vector ct1, ct2,..., ctk.
  4. Calculate αt from the context vector of each column.
  5. Next, the parameter Θt is calculated by αt and the embedding ctt of the time step.
  6. Enter Θt and the predicted value π(xt+1|t) of the next time step into the model to predict the conditional probability distribution of the next time step.
  7. Generate a global encoding vector cg.
  8. Update the split point encoding matrix, and splice the split point encoding matrix and time series data together to calculate the predicted value.

Guess you like

Origin blog.csdn.net/universsky2015/article/details/133566193