The Chinese school teacher updated his personal notes and added Transformer notes, including XLNet, MT-DNN, ERNIE, ERNIE2, RoBERTa, etc. The content is very detailed and worth learning, so I recommend it.
The author is a senior algorithm engineer at Alibaba and the chief algorithm researcher of Zhiyi Technology. He is currently a senior researcher at Tencent and the author of "Python vs. Machine Learning". Teacher Hua is also a guest of our knowledge planet.
This is the author's study and summary notes for many years, after sorting out, it is open sourced to the world.
Note address:
http://www.huaxiaozhuan.com/Deep Learning/chapters/7_Transformer.html
Introduction to Transformer
Transformer is a new feature extractor based on the attention mechanism, which can be used to replace CNN and RNN to extract sequence features.
Transformer was first proposed by the paper "Attention Is All You Need", in which Transformer is used in the encoder-decoder architecture. In fact, Transformer can be applied to encoder alone or to decoder alone.
Transformer notes directory
One, Transformer
-
1.1 Structure
-
1.2 Transformer vs CNN vs RNN
- 1.3 Experimental results
二、Universal Transformer
-
2.1 Structure
-
2.2 ACT
- 2.3 Experimental results
Three, Transformer XL
-
3.1 Segment-level recursion
-
3.2 Relative position coding
- 3.3 Experimental results
Four, GPT
-
4.1 GPT V1
- 4.2 GPT V2
Five, BERT
-
5.1 Pre-training
-
5.2 Model structure
-
5.3 Fine tuning
- 5.4 Performance
-
六、ERNIE
-
6.1 ERNIE 1.0
- 6.2 ERNIE 2.0
Seven, XLNet
- 7.1 Autoregressive language model vs self-encoding language model
-
7.2 Permutation Language Model
- 7.3 Two-Stream Self-Attention
-
7.4 Partial Prediction
-
7.5 Introducing Transformer XL
-
7.6 Multiple inputs
-
7.7 Model comparison
- 7.8 Experiment
8. MT-DNN
-
8.1 Model
- 8.2 Experiment
Nine, BERT extension
-
9.1 BERT-wwm-ext
- 9.2 RoBERTa
Note screenshot
Note screenshot
other
Personal website of Chinese school teacher:
http://www.huaxiaozhuan.com/
Note address:
http://www.huaxiaozhuan.com/Deep Learning/chapters/7_Transformer.html
github: