【读论文】A Deep Neural Network for Unsupervised Anomaly Detection and Diagnosis in Multivariate Time...

1. The main research content of this article

A Deep Neural Network for Unsupervised Anomaly Detection and Diagnosis in Multivariate Time Series Data: Unsupervised Anomaly Detection and Diagnosis in Multivariate Time Series Data Based on Deep Neural Network

Goal : Anomaly detection and diagnosis (root cause identification) on multivariate time series data , and provide operators with different levels of anomaly scores according to the severity of different incidents.

Existing challenges :
(1) There are few or no anomaly labels in historical data, which makes supervised algorithms infeasible;
(2) There are temporal correlations between multivariate time series data, so the model needs to be able to capture the temporal dependence of different time steps; (3
) In practical applications, multivariate time series data usually contain noise. When the noise becomes serious, it may affect the generalization ability of the time prediction model. Therefore the system should be robust to noise.

The method proposed in the article : Multi-Scale Convolutional Recurrent Encoder-Decoder (Multi-Scale Convolutional Recurrent Encoder-Decoder, MSCRED) The
main ideas of this method:
(1) MSCRED first constructs a multi-scale (resolution) feature matrix to describe the multi-level system state across different time steps, and the different levels of the system state represent the severity of different abnormal events.
(2) Given a signature matrix, a convolutional encoder is used to encode the correlation patterns between sensors, and an attention-based Convolutional Long Short-Term Memory (ConvLSTM) network is used to model the temporal information; (3) A convolutional decoder is used to reconstruct the feature matrix, and a square loss is used for end-to-end learning
.



2. MSCRED Framework

1. Problem Statement

Given n time series data X = ( x 1 , x 2 , . . . , xn ) T ∈ R n × T \pmb{X} = (\pmb{x_1}, \pmb{x_2}, ... , \pmb{x_n})^T \in R^{n\times T}XX=(x1x1,x2x2,...,xnxn)TRn × T , and it is assumed that there are no abnormalities in the data.

Because normal data is used in the training process, we can understand that the neural network has learned a time series in a normal mode.
For the test data, if the test data and the sequence reconstructed by the network are very different, we consider it an anomaly.

Our goals are twofold:

  • Anomaly detection : that is, to detect abnormal events at a certain time step after T.
  • Abnormality diagnosis : that is, according to the detection results, identify the abnormal time series that is most likely to cause each abnormality, and qualitatively explain the severity (duration time scale) of the abnormality.

2. Overview

insert image description here

(1) Characterizing Status with Signature Matrices, using the feature matrix to characterize the status

To characterize from t − w twtw tottThe relationship between different time series fragments in the time step t , we constructed a n × nn\times nn×The characteristic matrixM t M^t of nMt

内积 m i j t ∈ M t m_{ij}^t \in M^t mijtMThe calculation process of t
is: mijt = ∑ δ = 0 wxit − δ xjt − δ κ m_{ij}^t = \frac{\sum\limits^w_{\delta=0} x_i^{t-\delta} x_j^{t-\delta}}{\kappa}mijt=Kd = 0wxit dxjt dAmong them, κ \kappaκ is the scaling factor (κ = w \kappa=wK=w ).
In order to represent the state of the system at different scales, we constructs ( s = 3 ) s(s=3)s(s=3 ) Different time steps( w = 10 , 30 , 60 ) (w=10,30,60)(w=10,30,60 ) feature matrix. (This article assumes that the severity of the event is proportional to the duration of the anomaly)

1 time step corresponds to an n × n × 3 n\times n\times 3n×n×3 signature matrix, soTTT time steps correspond toTTT n × n × 3 n\times n\times 3 n×n×3 signature matrix.


(2) Convolutional Encoder, convolutional encoder

Here a fully convolutional encoder is used to encode the spatial pattern of the system eigenmatrix.

First we set M t M^tMt spliced ​​into a tensorX t , 0 ∈ R n × n × s \mathcal{X}^{t,0} \in R^{n\times n\times s}Xt,0Rn × n × s , which are then fed into several convolutional layers. Supposel − 1 l-1lThe feature map of layer 1 is X t , l − 1 ∈ R nl − 1 × nl − 1 × dl − 1 \mathcal{X}^{t,l-1} \in R^{n_{l-1} \times n_{l-1}\times d_{l-1}}Xt,l1Rnl1×nl1×dl1, then the llThe output of layer l
is: X t , l = f ( W l ∗ X t , l − 1 + bl ) \mathcal{X}^{t,l} = f(W^l * \mathcal{X}^{t,l-1} + b^l)Xt,l=f(WlXt,l1+bl )Here * represents the convolution operation,f ( ⋅ ) f(·)f() is the activation function,W l W^lWl is thellthThe convolution kernel of layer l , the size iskl × kl × dl − 1 k_l \times k_l \times d_{l-1}kl×kl×dl1 b l b^l bl is thellthBias term for layer l .

In this work, we use Scaled Exponential Linear Unit (SELU) as the activation function and 4 convolutional layers, namely Conv1-Conv4, with 32 kernels of size 3 × 3 × 3, 64 kernels of size 3 × 3 × 32, 128 kernels of size 2×2 × 64, and 256 kernels of size 2×2 × 128. 1×1, 2×2, 2×2 and 2×2 respectively.

For example, for 10000 minutes of data from 30 sensors, the input is 30 × 10000 30\times 1000030×10000 matrix, set 3 time windows, the lengths are 10, 30, 60, and the time interval is 10. Then you can get 10000/10=100030 × 30 × 3 30\times 30\times 330×30×3 feature matrix. For each30 × 30 × 3 30\times 30\times 330×30×3 feature matrix, the convolution operation process is:
insert image description here



(3) Attention based ConvLSTM, an attention-based convolutional long-short-term memory network

The spatial feature maps produced by convolutional encoders are temporally dependent on previous time steps. Although ConvLSTM (Shi et al. 2015) has been developed to capture temporal information in video sequences, its performance may deteriorate as sequence length increases.

In ConvLSTM, given the llthThe feature map X t of l convolutional layers, l \mathcal{X}^{t,l}Xt , l,当前状态H t , l = C onv LSTM ( X t , l , H t , l − 1 ) \mathcal{H}^{t,l} = ConvLSTM(\mathcal{X}^{t,l} , \mathcal{H}^{t,l-1})Ht,l=ConvLSTM(Xt,l,Ht , l 1 ). The specific method is:

insert image description here

In order to overcome the shortcomings of LSTM in processing three-dimensional information, ConvLSTM converts the 2D input in LSTM into a 3D tensor, and the last two dimensions are spatial dimensions (rows and columns). For the data at each time t, ConvLSTM replaces part of the connection operation in LSTM with a convolution operation, that is, predicts through the current input and the past state of local neighbors.


To address the problems of ConvLSTM mentioned above, in this paper we develop an attention-based ConvLSTM that adaptively selects relevant hidden states (feature maps) across different time steps. Right now:

insert image description here
Among them, Vec(·) is a vector, χ \chiχ is the scaling factor (χ \chix =5.0).


(4) Convolutional Decoder, convolutional decoder

The expression for the convolutional decoder is:

insert image description here


3. Loss Function

The reconstruction error of the feature matrix is ​​defined as:
insert image description here

Guess you like

Origin blog.csdn.net/qq_42757191/article/details/126367108
Recommended