seq2seq (1) - EncoderDecoder architecture

zero

seq2seq from sequence to sequence the learning process, the most important is the input sequence and the output sequence is of variable length, this approach is very flexible, typical of machine translation is such a process.


One

Seq2seq basic network architecture as follows:

You can see, encoder constitute a RNN network, decoder also a RNN network. The training process and the inference process there are some not the same place, are described below.

Training process:

  1. RNN encoder constituting a network, the source language text input, the output of the last timestep hidden state, while not requiring Output, the last hidden state as the initialization state decoder;
  2. RNN decoder also constitute a network, enter text in the target language, this place needs to be noted that the input lag back a position, the output is normal to the target language text, multiple choice categorical cross entropy classification training.
# input sentence
How are you
# output sentence I am fine # encoder input ["How", "are", "you"] # decoder input ["<start tag>", "I", "am", "fine"] # decoder target ["I", "am", "fine", "<end tag>"]

Inference process:

Only encoder input inference process, so there is a method of decoding a greedy / sampling / beam-search, etc., to discuss the most simple greedy method below.

  1. The encoder input source language encoded as hidden state after the last of the timestep;
  2. Input is set to a target language word <start tag>, fed decoder, outputs a target word;
  3. The target word in the previous step as the target language new input, continue to Step 2, until it encounters <end tag>, or the length of the sequence generated by the prediction exceeds a threshold.

two

These are the most basic seq2seq architecture, which was simple, disadvantages are also obvious that we humans usually translated text when the target language and source language word often only limited text which one or two words, whereas the above approach, the source language text encoding into a fixed length of hidden state, resulting in the course of each word decoder is affected fixed state, without differentiation and focus, which next will introduce a more important mechanism seq2seq optimization - Attention mechanism .

Guess you like

Origin www.cnblogs.com/jfdwd/p/11068999.html