LLM: finetune pre-trained language model

model training

GPT-2/GPT and causal language modeling

model used

AutoModelForCausalLM

[examples/pytorch/language-modeling#gpt-2gpt-and-causal-language-modeling]

[examples/pytorch/language-modeling/run_clm.py]

Example:

[colab.research.google.com/Causal Language modeling]

RoBERTa/BERT/DistilBERT and masked language modeling

[examples/pytorch/language-modeling#robertabertdistilbert-and-masked-language-modeling]

[examples/pytorch/language-modeling/run_mlm.py]

model used

AutoModelForMaskedLM, specifically BertForMaskedLM .

Things that may need to be changed in run_mlm.py:
1 The reading and setting of max_seq_length defaults to 1024, if it is different from the model, it may need to be modified.
2 There is the logic of splicing multiple texts into max_seq_length length, tokenized_datasets = tokenized_datasets.map(group_texts...), this should be removed depending on the situation.

3 Here AutoModelForMaskedLM/BertForMaskedLM only pre-trains mlm tasks without nsp tasks. If you want to add nsp tasks, you need to use BertForPreTraining. The mlm task does not have the following parameters, and it cannot be trained: bert.pooler.dense.weight; bert.pooler.dense.bias; cls.seq_relationship.weight;

Guess you like

Origin blog.csdn.net/pipisorry/article/details/131170284