Introduction

Introduction

In sequence codec

  1. RNN cannot learn global structural information well because it is essentially a Markov decision process.
  2. The CNN scheme is also very natural, window traversal, such as convolution of size 3
  3. google proposes attention
    Insert picture description here

attention process:
Insert picture description here

Reference:
1. Attention attention mechanism in nlp + Transformer detailed explanation
2. "Attention is All You Need" shallow reading
3. (Exploration of Linear Attention: Does Attention have to have a Softmax?) [https://kexue.fm/ archives/7546]

Guess you like

Origin blog.csdn.net/kingiscoming/article/details/113668202