Introduction to Self-Attention Mechanism Transformers: Attention is all you need

insert image description here
"Attention is All You Need" is a 2017 research paper by Google researchers that introduced the Transformer model, a revolutionary architecture that revolutionized the field of Natural Language Processing (NLP), And became the basis of LLMs as we know them now - such as GPT, PaLM and other models. This paper proposes a neural network architecture that replaces traditional recurrent neural networks (RNNs) and convolutional neural networks (CNNs) with a fully attention-based mechanism.

The Transformer model uses self-attention to compute representations of input sequences, which enables it to capture long-term dependencies and compute efficiently in parallel. The authors demonstrate that their model achieves state-of-the-art performance on several machine translation tasks and outperforms previous models relying on RNNs or CNNs.

The Transformer architecture consists of an encoder and a decoder, and each part consists of several layers. Each layer consists of two sublayers: a multi-head self-attention mechanism and a feed-forward neural network. The multi-head self-attention mechanism enables the model to focus on different parts of the input sequence, while the feed-forward network applies point-to-point fully-connected layers to each position separately and identically.

Transformer models also use residual connections and layer normalization to facilitate training and prevent overfitting. Furthermore, the authors introduce a positional encoding scheme that encodes the position of each token in the input sequence, enabling the model to capture the order of the sequence without looping or convolutional operations.

You can read the Transformers paper .

reference

  • https://www.coursera.org/learn/generative-ai-with-llms/supplement/Il7wV/transformers-attention-is-all-you-need
  • https://arxiv.org/abs/1706.03762

おすすめ

転載: blog.csdn.net/zgpeace/article/details/132392269