Understanding Transformer

Let’s learn about Transformer from several points, namely: 1. What is Transformer? , 2. What is the definition of Transformer? 3. Why use Transformer?

Transformer The network architecture architecture was proposed by Ashish Vaswani and others in the article Attention Is All You Need, and is used for Google's machine translation tasks. However, this model does not use the previous RNN or CNN network architecture, but uses an attention mechanism. This model is widely used in NLP fields, such as machine translation, question answering systems, text summarization and speech recognition, etc.

1. What is Transformer?

In the field of natural language processing (NLP), the emergence of the Transformer model has undoubtedly caused tremendous changes. This deep learning architecture proposed by Google in 2017 has gradually become a mainstream solution for NLP tasks with its powerful representation capabilities and efficient processing speed. In this blog, we will delve into the working principle, advantages and application of the Transformer model in the field of NLP.

2. What is the definition of Transformer?

Simply put, the Transformer model consists of two parts: an encoder (Encoder) and a decoder (Decoder), both of which are stacked by multiple identical layers. Each layer contains a multi-head self-attention sub-layer (Multi-Head Self-Attention) and a feed-forward neural network sub-layer (Feed-Forward Neural Network). The encoder converts the input sequence into context vectors, and the decoder uses these context vectors to generate the output sequence.

3. Why use Transformer?

Capture global information: The multi-head self-attention sublayer allows the model to focus on multiple positions in the input sequence simultaneously, thereby capturing global information. This helps solve some NLP tasks that rely on global information, such as summarization, machine translation, etc.

Efficient parallel computing: Since the Transformer model is based on matrix multiplication operations, it can make good use of the GPU to accelerate calculations and improve processing efficiency. This allows Transformer to have better parallelism and shorter training time when processing long texts.

No need to explicitly use loop structures: In traditional recurrent neural networks (RNN), complex sequence dependencies need to be handled using loop structures. In the Transformer model, this dependency is captured through the self-attention mechanism and feed-forward neural network, without the need for explicit loop structures.

Better processing of long sequences: Traditional RNN is prone to problems of gradient disappearance or gradient explosion when processing long sequences. The Transformer model can better handle long sequence information through the multi-head self-attention mechanism and feed-forward neural network.

Powerful representation ability: The Transformer model has powerful representation ability, which can capture more language features to better understand natural language. This makes Transformer perform well in language modeling and natural language processing tasks.

1. Traditional RNN network

Two,Transformer overall architecture

 Comparison between transformer and CNN
        Each layer of CNN obtains local information. To obtain a larger receptive field, multiple layers need to be stacked. The transformer does not need to be stacked at all and can directly obtain global information.

        But the disadvantage of transformer is that it has a large number of parameters and high training configuration requirements. At the same time, transformer needs to obtain the characteristics of each category compared with other categories.

Guess you like

Origin blog.csdn.net/weixin_72965172/article/details/134912712