- Google original bert : https://github.com/google-research/bert
- brightmart版roberta: https://github.com/brightmart/roberta_zh
- Harbin Institute of Technology version roberta : https://github.com/ymcui/Chinese-BERT-wwm
- Google original albert [example] : https://github.com/google-research/ALBERT
- Brightmart version of albert : https://github.com/brightmart/albert_zh
- Converted albert : https://github.com/bojone/albert_zh
- Huawei's NEZHA : https://github.com/huawei-noah/Pretrained-Language-Model/tree/master/NEZHA
- Self-developed language model : https://github.com/ZhuiyiTechnology/pretrained-models
- T5 model : https://github.com/google-research/text-to-text-transfer-transformer
- GPT2_ML: https://github.com/imcaspar/gpt2-ml
- Google original ELECTRA : https://github.com/google-research/electra
- Harbin Institute of Technology version ELECTRA : https://github.com/ymcui/Chinese-ELECTRA
- CLUE版ELECTRA: https://github.com/CLUEbenchmark/ELECTRA
From Word Embedding to Bert model-the history of pre-training technology in natural language processing https://zhuanlan.zhihu.com/p/49271699
Attention
https://zhuanlan.zhihu.com/p/37601161
The development of pre-training in natural language processing: from Word Embedding to BERT model (ppt streamlined version)
https://mp.weixin.qq.com/s/LGJvvhotSg7XMn8mg3TZUw
[NLP] Attention principle and source code analysis
https://zhuanlan.zhihu.com/p/43493999
BERT installation and use
https://www.cnblogs.com/nxf-rabbit75/p/11938504.html