Transformer zero-based learning

Disclaimer: The following article links are only for personal study and memo.

basic knowledge

1: Zero-based analysis tutorial [recommended]
https://zhuanlan.zhihu.com/p/609271490

2: Transformer detailed explanation [recommended]
https://wmathor.com/index.php/archives/1438/

3: How to understand transformer from shallow to deep?
https://www.zhihu.com/question/471328838/answer/3011638037

4: Detailed explanation of Transformer model (the most complete version with illustrations) [Recommended]

Detailed explanation of the Transformer model (the most complete version with illustrations) - Zhihu

5: Interpretation of the Transformer model and Attention mechanism in a 10,000-word long article  [Recommended]

 [Classic Intensive Reading] A long article of 10,000 words explaining the Transformer model and Attention mechanism - Zhihu

Doubt analysis

1: Why do the K and V in the transformer decoder use the K and V output by the encoder?

https://www.zhihu.com/question/458687952

2: Teacher Forcing  、 Autoregressive、Exposure Bias 解释

Thoughts about Teacher Forcing and Exposure Bias - Zhihu

3: How is the decoder part of the training parallelized?

A brief analysis of parallel issues during Transformer training - Zhihu

A brief analysis of parallel issues during Transformer training_Where is transformer parallelism reflected_A blog for thinking about practice-CSDN blog

 Understanding masked attention in Transformer decoder_Sili LZS's Blog-CSDN Blog

 4: During testing or prediction, why does the decoder in Transformer still need seq mask?

When testing or predicting, why does the decoder in Transformer still need seq mask? - Know almost

In-depth understanding of transformer source code_Team Zhao’s blog-CSDN blog

Guess you like

Origin blog.csdn.net/lilai619/article/details/131410327