LLM-chatgpt training process

Process introduction

  • It mainly includes two stages: model pre-training and instruction fine-tuning.
    • Model pre-training: collect massive text data and unsupervised training of autoregressive decoder;
      OT = P ( O t < T ) O_T=P(O_{t<T})OT=P(Ot<T) , loss function CE loss
    • Instruction fine-tuning: Add task prompts to the input text.
      • Enter "Translate text into English: Unsupervised training. Translation:" and let the model output "Non-supervised"
      • It is also an autoregressive training process. The loss function is the same as pre-training, but the input data has a paradigm.

Instruction fine-tuning

Insert image description here

  • Instruction fine-tuning is generally divided into three stages
    • Collect a large number of questions from users, invite professionals to give high-quality answers, and then use these data to fine-tune to generate models;
    • Let the trained generative model give multiple answers based on user questions, and invite real people to rate the quality of the answers. The user trains the reward model on these scored data;
    • By combining the generative model and the reward model, you can generate answers by yourself, evaluate the quality of the results by yourself, and continuously optimize.

Reference blog

model training of brightliao-ChatGPT

Guess you like

Origin blog.csdn.net/qq_40168949/article/details/132498605