seq2seq entirely based on neural network convolution

Reference herein:
Gehring J, Auli M, Grangier D, et al. Convolutional Sequence to Sequence Learning[J]. arXiv preprint arXiv:1705.03122, 2017.
Cited: 13
 
Dauphin Y N, Fan A, Auli M, et al. Language modeling with gated convolutional networks[J]. arXiv preprint arXiv:1612.08083, 2016.
Times Cited: 24
 
A model today is talk about Facebook artificial intelligence based entirely on the convolution presented to the Academy of neural networks seq2seq framework, seq2seq I have said before push in good times, the traditional model is based on RNN seq2seq implemented especially LSTM, which brings computational complexity of the problem. Facebook made a bold change, encoder, decoder, attentional mechanisms and even the memory unit to replace all convolution neural networks, the idea is not simply rude? Although only see the context of the single CNN fixed range, but a plurality of CNN can easily add up to an effective range is enlarged context. Facebook this model successfully applied to English - French machine translation, English - German machine translation, not only set a pre-record them, but also training an order of magnitude faster, whether it is on the GPU or CPU.
 
Before beginning a detailed description conv seq2seq model of Facebook, we need to look at Gated CNN, this is also a model proposed by Facebook at the end of a modeling language.
 
Gated CNN model for language modeling as shown below, you can see, the top of the word embedding operation with the traditional modeling language makes no difference, followed by that of the embedding vectors divided time window and do the convolution operation, attention As used herein, the two convolutional neural network, wherein one of a multiplication of these two neural networks by activating a function with the other, to give the final output. Here, the reader should have been found in which a convolution neural network function is to act as the role of gate that controls how much useful information as the final output. Meanwhile, the results also show Gated CNN has achieved good results in the WikiText-103.
 

In conv seq2seq this article, also used Gated CNN and Residual connection, the text of the model structure diagram shown below, following me carefully explain the details of the calculation here.
 

For the encoder, the original words first embedding layer was to go through their respective embedding vector, then the embedding vector as the input Gated CNN, It should be noted that, in order to ensure consistent operation after convolution with the input length before, the volume product needs to be done pad operation. There are two places in the model were used to GLU (Gated Linear Unit), I have shown in red in FIG words, the embedding encoder and decoder respectively embedding status of each of the respective GLU distributed unit, the two states can be obtained dot matrix attention weights figure marked by the attention red font, particular attention weight calculated as shown in formula:
 

 
Noting FIG embedding encoder state and encoder are summed and multiplied by the weight of attention, the results obtained herein referred to as condition input C, where we can compare the conventional focus mechanism, a conventional mechanism for attention direct attention weight and the state of the encoder is multiplied, and here introduced embedding an amount, herein explained because embedding may be the combined information of specific elements in making predictions when increased positioning, condition input calculates c of FIG formula shown below:

 
将条件输入c加上解码器的状态,即可得到输出序列的概率,以上就是conv seq2seq的模型结构。作者最终在机器翻译上相比其他RNN的模型速度提高了近10倍!

Guess you like

Origin www.cnblogs.com/mfryf/p/11373185.html