- Google原版bert: https://github.com/google-research/bert
- brightmart版roberta: https://github.com/brightmart/roberta_zh
- 哈工大版roberta: https://github.com/ymcui/Chinese-BERT-wwm
- Google原版albert[例子]: https://github.com/google-research/ALBERT
- brightmart版albert: https://github.com/brightmart/albert_zh
- 转换后的albert: https://github.com/bojone/albert_zh
- 华为的NEZHA: https://github.com/huawei-noah/Pretrained-Language-Model/tree/master/NEZHA
- 自研语言模型: https://github.com/ZhuiyiTechnology/pretrained-models
- T5模型: https://github.com/google-research/text-to-text-transfer-transformer
- GPT2_ML: https://github.com/imcaspar/gpt2-ml
- Google原版ELECTRA: https://github.com/google-research/electra
- 哈工大版ELECTRA: https://github.com/ymcui/Chinese-ELECTRA
- CLUE版ELECTRA: https://github.com/CLUEbenchmark/ELECTRA
从Word Embedding到Bert模型—自然语言处理中的预训练技术发展史https://zhuanlan.zhihu.com/p/49271699
Attention
https://zhuanlan.zhihu.com/p/37601161
预训练在自然语言处理的发展: 从Word Embedding到BERT模型(ppt精简版)
https://mp.weixin.qq.com/s/LGJvvhotSg7XMn8mg3TZUw
【NLP】Attention原理和源码解析
https://zhuanlan.zhihu.com/p/43493999