Attention mechanism to resolve - reprint Attention

Original Address: Detailed model https://terrifyzhao.github.io/2019/01/04/Attention .html

 

model is a way to get attention to important information and focus fully absorb learning technology, it is not a complete model, it should be a technology that works for any sequence model.

 

Traditional Seq2Seq

Before you begin to explain Attention, we first briefly recall Seq2Seq model, the traditional machine translation are based on the basic model Seq2Seq to do, the model is divided into layers encoder and decoder layer, and are variants or RNN RNN constituted as follows Figure:

 

 

 

In encode phases, the first input node of a word, after the input node is a hidden state with the next word before a node, the final encoder outputs a context, the context and as an input decoder, each node through a decoder word is output after a translation, and the decoder as the next layer of hidden state input. Change model for short text translation is very good, but there are some disadvantages, if the text is a little longer, it is easy to lose some information about the text, in order to solve this problem, Attention emerged.

 

Attention mechanism

Attention, as its name suggests, the attention, the model decode stage, selects the best of the current node context as input. Attention to the traditional model Seq2Seq There are different the following two points.

 

-> encoder provides more data to a hidden state to all nodes of the decoder, encoder will provide for decoder, encoder and not just the last node of the hidden state.

 

-> decoder does not directly provide all the hidden state encoder as input, but to adopt a selection mechanism, the most in line with the current location of hidden state elected, concrete steps are as follows:

  • A hidden state to determine which is most closely related to the current node
  • Calculating a score value of each hidden state (specifically how We explain below)
  • Softmax make a calculation for each point value, which allows high correlation scores hidden state value larger fraction of low correlation value lower hidden state

 

Here we detail the step of calculating therein to look at a specific example:

The hidden state of the node on a hidden states for each encoder and decoder node value by multiplying the current node, as shown below, h1, h2, h3 respectively multiplied by (if it is the first node and a hidden state of the current node a node decoder, a random initialization need hidden state), eventually obtained three values, the three hidden state value of the score is mentioned above, note that this value is not the same for each of the nodes of the encoder is and to the fractional value softmax calculated value after calculation is hidden states of each encoder node to the right of the current node's weight, the weight of the original hidden states are multiplied and added, the results obtained that is the hidden state of the current node . Can be found, in fact, is the key Atttention calculate the score.

 

 

After understand each node is how to obtain hidden state, the next step is the working principle of the decoder layer, the specific process is as follows:

The first node initializing a decoder in a vector, and calculates a current hidden state of the node, and the hidden state of the node as a first input, obtained after a new node RNN hidden state and an output value, attention, and there have Seq2Seq a big difference, Seq2Seq is directly output as the output value of the current node, but the value Attention will make a connection with the hidden state, and the connected value as the context, and fed into a feed forward neural network before finally content of the current output is determined by the network node, repeat the above steps until all nodes of the decoder output are the content.

 

 Attention model not just blindly align the first word and the first word of input output. In fact, it is in the training phase of learning how to align the word in the language of (French and English in our example).

 

Guess you like

Origin www.cnblogs.com/ruili07/p/11497033.html