GPT practical series-Dahua LLM large model training

GPT practical series-Dahua LLM large model training


GPT Practical Series-Exploring text generation of large models such as GPT

GPT practical series-Calculation accuracy and quantification of large models such as Baichuan2

GPT practical series-GPT training Pretraining, SFT, Reward Modeling, RLHF

GPT practical series-ChatGLM3 local deployment CUDA11+1080Ti+ graphics card 24G practical solution

GPT practical series-Baichuan2 localized deployment practical plan


The GPT model generates text sequences by predicting the next token. In the current pre-training model landscape, OpenAI leads the way, followed by Google and Meta, but there is still a certain distance behind. The training of large models basically follows a staged training process.


pre-training phase

First, use large-scale corpus to pre-train the model and train the base model.

  • The input of the pre-trained model is to obtain the tokenized text and package it into lines to <|endoftext|>separate different documents. <|endoftext|>is the document delimiter that the model sees during training.

Taking the training process of Shakespeare's works data set as an example, the model has just started training, the parameters are randomly initialized, and the prediction results are also completely random. However, as the training continues to iterate, it is observed that the text generated by the model after iterations of 250, 500, 5,000, and 30,000 times continues to converge. After reaching a certain level, the model can generate a coherent text sequence.

  • Training is divided into two main stages. The pre-training stage is based on a large number of unlabeled data sets, and the generated base model learns powerful general expression features. In the fine-tuning stage, based on the base model, a small amount of annotated data sets are used for training based on the field of interest. Staged training can greatly reduce the amount of data required for fine-tuning.

  • Dialogue models need to respond to human instructions or questions, but the pre-trained base model only completes the document text and cannot directly answer questions.

  • However, you can use the prompts of generated documents to guide the pre-trained base model to generate document text, and then assemble the generated document text into a dialogue. The above method can make the pre-trained model become a dialogue model.

fine-tuning phase

OpenAI’s ChatGPT implementation solution

Step 1 of fine-tuning the dialogue model: SFT
  • First, supervised finetuning training SFT (supervised finetuning) is performed, which is fine-tuned based on a small amount of manually labeled data. The training data set is mainly question and answer pairs (10,000 to 100,000 levels).
  • SFT data example, prompt is a question, and response is an annotated example reply. Annotated documents need to meet the constraints of being useful, credible, harmless, ethical and legal.
Step 2 of fine-tuning the dialogue model: Reward Modeling
  • The evaluation model is used to evaluate the quality of generated responses.
  • Prepare annotated data sets (100,000 to 1 million levels) and conduct good or bad classification training.
  • RM dataset example. Given a prompt question, multiple responses are generated based on the SFT model in the previous step, for example, three responses are generated, and then the annotator is asked to rank and classify the response results.
Step 3 of fine-tuning the dialogue model: RL
  • Reinforcement learning RLHF performs reinforcement learning training based on the RM model in the previous step. Evaluate model rewards to adjust the model to generate language modeling goals. For example, if the reward of the first answer is high, all the tokens taken by the first answer will be strengthened and will have a higher take probability in the future. The tokens of other answers will get a lower take probability in the future.

Why RLHF is needed

  • Pre-trained models, SFT models and RLHF models can theoretically be used for GPT dialogue model deployment. A simple reason for choosing to use RLHF is that the RLHF model works better and the answers generated by the model are more palatable to humans.
  • The reason why RLHF is more useful is that discrimination is easier than generation. It will be difficult for annotators to write SFT question and answer data pairs, but it is much simpler if annotators are asked to judge which answer generated by the SFT model is better.
  • The University of California, Berkeley, made a model evaluation list Leaderboard , and GPT4 was the strongest. The first three are all RLHF models, and the other models are SFT models.

Found it useful . Collect it. Collect it.

Like it like it like it like it

End


GPT column article:

GPT Practical Series - Calculation accuracy and quantification of large models such as Baichuan2 - CSDN Blog

GPT practical series-Pretraining, SFT, Reward Modeling, RLHF-CSDN blog of GPT training

GPT practical series-GPT training Pretraining, SFT, Reward Modeling, RLHF

GPT practical series-P-Tuning localized training ChatGLM2 and other LLM models, what exactly did it do? (two)

GPT practical series-P-Tuning localized training ChatGLM2 and other LLM models, what exactly did it do? (one)

GPT practical series-ChatGLM3 local deployment CUDA11+1080Ti+ graphics card 24G practical solution

GPT Practical Series-Interpretation of Fine-tuning Training Parameters of ChatGLM2 Model

GPT Practical Series-How to fine-tune ChatGLM2 model training with your own data

GPT practical series-ChatGLM2 deploys Ubuntu+Cuda11+video memory 24G practical solution

GPT practical series-Baichuan2 localized deployment practical plan

GPT Practical Series-Exploring text generation of large models such as GPT-CSDN Blog


Decision Engine Column:
Falcon builds lightweight REST API service

Decision Engine-Using Drools to implement simple firewall policies

Guess you like

Origin blog.csdn.net/Alex_StarSky/article/details/83933157