Attention is all you need: the core idea of Transformer

introduce

In this blog post, I will discuss the most revolutionary paper of this century, “Attention Is All You Need” (Vaswani et al.). First, I will introduce the self-attention mechanism, and then turn to the architectural details of the Transformer. In my last blog post, From Seq2Seq to Attention: Revolutionizing Sequence Modeling, I discussed the origins of the attention mechanism and Bahdanau attention. In this blog, I will build on the previous information. So if you haven’t checked out the previous article, please go check it out. The Bahdanau attention model uses 2 RNNs and an attention mechanism to assign weights to the hidden states of the encoder. In the paper "Attention is all you need", the author removed all RNNs. They introduced a new architecture that does not use recursion, but relies entirely on self-attention mechanisms. Let’s first explain what the self-attention mechanism is:

self-attention mechanism

The self-attention mechanism enables the model to capture the dependencies between different positions in the sequence by paying attention to all positions simultaneously. In our last blog, we discussed using queries and key-value pairs to calculate attention scores. The attention score determines the importance or relevance of each key-value pair to a given query. The self-attention mechanism extends this mechanism so that it can operate within a single sequence without requiring external input.

Insert image description here
In the picture above, you can view the self-attention mechanism. Let me explain this picture from left to right. First, we have an input x. We multiply this input with the trainable weight matrix (Wq, Wk, Wv). As output we get the query, key and value matrix. We use query and key matrices to find their similarity. The above image only uses the dot product, but in the Transformer architecture we also scale it. The output of this dot product is the attention weight

Guess you like

Origin blog.csdn.net/iCloudEnd/article/details/132773367