[History of Attention Evolution] Translation Model seq2seq (Part 2)

The first RNN introduction link

In the previous article, we briefly talked about what RNN is, and how the specific data ran out when it was on the model. In short, we regard the inside of RNN as a box, and put the input words into it one by one. put, and according to the output of each step synchronously as the input of the next step, so as to achieve the effect of synthesizing the previous sentence meaning and the current word meaning. In this article, we mainly introduce seq2eq, one of the classic models for translation tasks.

what is the translation

For thousands of years, translation has always been a hard job. You must not only be proficient in your own language, but also be proficient in foreign languages, and try to satisfy all the judges who have different levels but have high requirements for uniformity and achieve "faithfulness and elegance". standard. Let's look at translation in an abstract way. A person wants to translate a book in Chinese, such as "The Analects of Confucius"

  1. He saw that "learning while learning from time to time is not easy to say"
  2. In his mind, the sentence he saw formed an easy-to-understand and intuitive thing: I study, and then regularly review the previous knowledge every day. This kind of thing is quite cool.
  3. Also in his mind, he began to try to use each word in English to correspond to the meaning of intuitive understanding, such as "study everyday and review regularly, isn't it something delightful?"
  4. In the end, he decided to write and replace the above-mentioned simple junior high school vocabulary and high school grammar with GRE/GMAT vocabulary one by one, and output the grammar for the purpose of making people unable to understand.

It's easy to understand this process with a flowchart, I get this sentence -> I understand this sentence -> I correspond to this sentence -> I output this sentence.

Encoder - Decoder

With the previous foreshadowing, it is much more straightforward to understand the model design here. Our input and output are all natural languages, such as Chinese, English, and Arabic, which we can understand. At the same time, the computer processes numbers and vectors. Let's skip the vector of words here, assuming we have obtained a word-by-word vector representation of the input sentence to be translated.
We need two RNNs, just like the title of this section, one is named Encoder, and his brother is called Decoder. Translated into adult language, it is very similar to information transmission during World War II. The sender encrypts the information and sends it out, and the receiver uses a certain method to restore it.
The task of the Encoder here is to run the RNN we mentioned before, and get a semantic vector as one of the inputs of the Decoder. Corresponding to the first two steps mentioned above, get this sentence and understand this sentence.
The task of the Decoder is to analyze the semantic vector and output it as text in the target language. This block corresponds to the next two steps, corresponding to each comprehension and output into target sentences.
insert image description here
The left side of the picture is the English encoder, and the right side is the French decoder. As for this RNN, we usually choose LSTM or GRU.

Guess you like

Origin blog.csdn.net/Petersburg/article/details/126046006