Implementation of seq2seq (1)

Application scenario

seq2seq is a commonly used model in natural language processing applications, general machine translation, text summarization, dialogue generation (although the language model + keyword generation method has been implemented before, this is the right way), text summarization and other tasks. More advanced models are relatively unified model architectures that iterate from the basic model.

The specific model principle will not be explained. There are many blogs that have been well explained. Here is just to update the experiment of seq2seq in machine translation during the weekend, and update it for peers' reference.

Well, seq2seq has several modes:
(1) The simplest one is that the hidden layer vector of the Encoder is copied directly as the input of the decoder, that is, the decoder does not require sequence input.
(2) A hidden layer vector of an Encoder is used as the initialization of the decoder, and the decoder has an input sequence and is misaligned with the output sequence for enlightenment.
(3) It is to combine (1) and (2), that is, to refer to the decoder input sequence, and to refer to the last hidden layer vector of the encoder to obtain more information for enlightenment.
(4) Because when translating each word, the contribution of each word at the input is actually different, so there is no diversity in the last hidden layer of the Encoder, so use attention to replace the Encoder hidden layer vector in (3).
Several methods are in the same continuous line and gradually deepen.

The method of (2) is implemented here.

    def build_model(self):
        
        encoder_input = layers.Input(shape=(self.input_seq_len,))
        encoder_embeding = layers.Embedding(input_dim=len(self.en_word_id_dict),
                                            output_dim=self.encode_embeding_len,
                                            mask_zero=True
                                            )(encoder_input)
        encoder_lstm, state_h, state_c = layers.LSTM(units=self.encode_embeding_len,
                                                     return_state=True)(encoder_embeding)

        encoder_state = [state_h, state_c]

        decoder_input = layers.Input(shape=(self.output_seq_len,))
        decoder_embeding = layers.Embedding(input_dim=len(self.ch_word_id_dict),
                                            output_dim=self.decode_embeding_len,
                                            mask_zero=True
                                            )(decoder_input)
        decoder_lstm, _, _ = layers.LSTM(units=self.encode_embeding_len,
                                         return_state=True,
                                         return_sequences=True)(decoder_embeding, initial_state=encoder_state)
        decoder_out = layers.Dense(len(self.ch_word_id_dict), activation="softmax")(decoder_lstm)

        model = Model([encoder_input, decoder_input], decoder_out)
        model.compile(optimizer='rmsprop', loss='categorical_crossentropy')
        # model.compile(optimizer='rmsprop', loss='sparse_categorical_crossentropy')
        model.summary()
        return model


Guess you like

Origin blog.csdn.net/cyinfi/article/details/88375608