model training
GPT-2/GPT and causal language modeling
model used
AutoModelForCausalLM
[examples/pytorch/language-modeling#gpt-2gpt-and-causal-language-modeling]
[examples/pytorch/language-modeling/run_clm.py]
Example:
[colab.research.google.com/Causal Language modeling]
RoBERTa/BERT/DistilBERT and masked language modeling
[examples/pytorch/language-modeling#robertabertdistilbert-and-masked-language-modeling]
[examples/pytorch/language-modeling/run_mlm.py]
model used
AutoModelForMaskedLM, specifically BertForMaskedLM .
Things that may need to be changed in run_mlm.py:
1 The reading and setting of max_seq_length defaults to 1024, if it is different from the model, it may need to be modified.
2 There is the logic of splicing multiple texts into max_seq_length length, tokenized_datasets = tokenized_datasets.map(group_texts...), this should be removed depending on the situation.
3 Here AutoModelForMaskedLM/BertForMaskedLM only pre-trains mlm tasks without nsp tasks. If you want to add nsp tasks, you need to use BertForPreTraining. The mlm task does not have the following parameters, and it cannot be trained: bert.pooler.dense.weight; bert.pooler.dense.bias; cls.seq_relationship.weight;