Intensive Reading of Transformer Papers

Table of contents

Abstract

Conclusion

 Introduction

Background

Training


Abstract

In a sequence transcription model, an "attention mechanism" is used between the encoder and the decoder, and two machine translation experiments are done, and the effect is better than other models.

Conclusion

Transformer is the first model for sequence transcription using only the attention mechanism. On the machine translation model, Transformer is much faster than other architectures, and the effect is better.

 Introduction

The shortcomings of RNN are described. In the RNN model, all the previous information is put into the hidden state, which cannot be parallelized in time, making the calculation performance relatively poor.

This paragraph tells that Transformer no longer uses the previous cyclic neural layer, but is purely based on the attention mechanism, so the degree of parallelism is relatively high, and a better result can be achieved in a short time.

Background

Proposes how to use convolutional neural network to replace your cyclic neural network to reduce timing calculations. At the same time, it is proposed that convolution can do multiple output channels, and one output channel can be considered to be able to identify different patterns.

Training

Source and processing of training data sets

 Using the device part, the training used 8 P100DE GPUs and trained for 12 hours on 8 GPUs.

 Regularization, using a large number of dropout layers to regularize the model

Guess you like

Origin blog.csdn.net/weixin_64443786/article/details/131879330