transformer assemble

(草稿)

1,Learning method: continual learning ERNIE2.0

2,Pre-training method: SOP,DLM

3,Pre-training corpus:

4,Data Augme:

5,memory/params/EPOCHS:

5.1,distill:TinyBERT

5.2,pruning:adaptive span,

fine-tuning:adapt:Parameter-Efficient Transfer Learning for NLP;topK

dynamic halting:

sparse:

position embedding:

param sharing:

猜你喜欢

转载自blog.csdn.net/dragonchow123/article/details/104870194