14 days depth hands-on science learning task2 "hands-on learning": attentional mechanisms and Seq2seq model notes

                                                  注意力机制
     解码器在各个时间步依赖相同的背景变量(context vector)来获取输⼊序列信息。当编码器为循环神经⽹络时,背景变量来⾃它最终时间步的隐藏状态。将源序列输入信息以循环单位状态编码,然后将其传递给解码器以生成目标序列。然而这种结构存在着问题,尤其是RNN机制实际中存在长程梯度消失的问题,对于较长的句子,我们很难寄希望于将输入的序列转化为定长的向量而保存所有的有效信息,所以随着所需翻译句子的长度的增加,这种结构的效果会显著下降。
     ![在这里插入图片描述](https://img-blog.csdnimg.cn/20200219214555216.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3FxXzQyOTg0NTQy,size_16,color_FFFFFF,t_70)
     与此同时,解码的目标词语可能只与原输入的部分词语有关,而并不是与所有的输入有关。例如,当把“Hello world”翻译成“Bonjour le monde”时,“Hello”映射成“Bonjour”,“world”映射成“monde”。在seq2seq模型中,解码器只能隐式地从编码器的最终状态中选择相应的信息。然而,注意力机制可以将这种选择过程显式地建模。

Attention mechanism frame
Attention is a common pool of weighted means, the input consists of two parts: inquiry (query) and a key-on (key-value pairs). ki∈Rdk, vi∈Rdv. Query q∈Rdq, attention layer to obtain the output value of the same dimension o∈Rdv. For a query is, attention layer calculates a focus score with each key and the weight normalized vector is the output value of the weighted sum, the weight of each key value correspondence with the calculated weight.

To calculate output, first assume that we have a function of α for the query and the similarity calculation key can then be calculated for all the attention scores a1, ..., an by

ai = α (q, ki).

We use the function to get the attention softmax weight:

b1,…,bn=softmax(a1,…,an).

The final output is the value of the weighted sum:

o = Σi = 1nbivi.
Here Insert Picture Description
distinguish different attetion layer is choosing the score function, in the remainder of this section, we will discuss two common note layer and Dot-product Attention Multilayer Perceptron Attention

Original articles published 0 · won praise 0 · Views 17

Guess you like

Origin blog.csdn.net/qq_42984542/article/details/104401430