Timing model (1)-TCN time convolutional network

 

I. Overview

TCN is a time series convolutional neural network model proposed in 18 years.

Time sequence problem modeling usually uses RNN recurrent neural network and related variants, such as LSTM, GRU, etc. Here, the convolutional neural network achieves the effect of capturing long-term dependent information through expanded convolution. TCN can even exceed RNN related models.    

参考论文:An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling

Github:https://github.com/LOCUSLAB/tcn

2. Principle

2.1 Causal Convolution

Let’s first introduce the Causal Convolution (Causal Convolution)

preview

For the value of the previous moment, it only depends on the value of the next layer moment and before. The causal convolution cannot see the future data, it is a one-way structure, and only with the previous cause can the latter result. However, this model can only capture and fix the value of the previous few moments. If you want to capture longer distance information, you need to increase the number of network layers, so the idea of ​​expanding convolution is developed.

2.2 Dilated Convolution

Dilated Convlution (Dilated Convlution), also called dilated convlution in some places.

Dilation convolution allows the input of convolution to have interval sampling, and the sampling rate is controlled by d in the figure. The d=1 in the bottom layer means that every point is sampled during input, and the middle layer d=2 means that every 2 points are sampled as input during input. Generally speaking, the higher the level, the larger the size of d. Therefore, dilated convolution makes the effective window size grow exponentially with the number of layers. In this way, the convolutional network uses fewer layers and can obtain a large receptive field.

2.3 Residual Connections

The residual link enables the network to transmit information across layers, avoiding the problem of information loss due to too many layers. The article construction loves you a residual block to replace a layer of convolution. As shown in the figure above, a residual block contains two layers of convolution and nonlinear mapping, and each layer also adds WeightNorm and Dropout to regularize the network.

 

Three, experiment

Four, advantages and disadvantages

Advantages  :

    (1) Parallelism. When a sentence is given, TCN can process the sentence in parallel, without the need for sequential processing like RNN.

    (2) Flexible receptive field. The size of the receptive field of TCN is determined by the number of layers, the size of the convolution kernel, and the expansion coefficient. It can be flexibly customized according to different tasks and different characteristics.

    (3) Stable gradient. RNN often has the problem of gradient disappearance and gradient explosion, which is mainly caused by sharing parameters in different time periods. Like traditional convolutional neural networks, TCN does not have the problem of gradient disappearance and explosion.

    (4) Lower memory. RNN needs to save the information of each step when it is used, which will occupy a large amount of memory. The convolution kernel of TCN is shared in one layer, and the memory usage is lower.

Disadvantages:

    (1) TCN may not have such strong adaptability in transfer learning. This is because in different fields, the amount of historical information required for model prediction may be different. Therefore, when migrating a model from a problem that requires less memory information to a problem that requires longer memory, TCN may perform poorly because its receptive field is not large enough.

    (2) The TCN described in the paper is still a one-way structure. In tasks such as speech recognition and speech synthesis, the pure one-way structure is still quite useful. However, most texts use a two-way structure. Of course, TCN can easily be extended to a two-way structure, instead of using causal convolution, just use the traditional convolution structure.

    (3) TCN is after all a variant of convolutional neural network. Although the use of extended convolution can expand the receptive field, it is still limited. Compared with Transformer, the characteristics that can be captured with any length of relevant information are still poor point. The application of TCN in the text remains to be tested.

 

5. Reference link:

TCN-Time Convolutional Network  https://blog.csdn.net/qq_27586341/article/details/90751794

TCN temporal convolutional network  https://zhuanlan.zhihu.com/p/51246745?utm_source=wechat_session&utm_medium=social&s_r=0  

Guess you like

Origin blog.csdn.net/katrina1rani/article/details/110137718