[论文笔记] chatgpt系列 2.2 DeepSpeed-chat 训练流程脚本

 Step 1 - Supervised Fine-Tuning

# Move into the first step of the pipeline cd training/step1_supervised_finetuning/

# Run the training script

bash training_scripts/single_gpu/ run_1.3b.sh

# Evaluate the model

bash evaluation_scripts/ run_prompt.sh

 Step 2 - Reward Model

# Move into the second step of the pipeline cd training/step2_reward_model_finetuning

# Run the training script

bash training_scripts/single_gpu/run_350m.sh

# Evaluate the model

bash evaluation_scripts/ run_eval.sh

 Step 3 - RLHF

作为整个3步InstructGPT管道中最复杂的一步,DeepSpeed Chat的混合动力引擎提供了足够的加速,以避免大量的训练时间(成本)。

强化学习人类反馈(RLHF)获取更多信息。如果您已经有了经过微调的Actor和奖励模型检查点,那么您可以简单地运行以下脚本来启用PPO训练。

# Move into the final step of the pipeline cd training/step3_rthffinetuning/

# Run the training script

bash training_scripts/single_gpu/ run_1.3b.sh

猜你喜欢

转载自blog.csdn.net/Trance95/article/details/130427042