[论文笔记] chatgpt系列 2.3 DeepSpeed-chat Reward模型训练

DeepSpeedExamples/applications/DeepSpeed-Chat at master · microsoft/DeepSpeedExamples · GitHub

第一步,SFT省略。

第二步,Reward Model训练。其中遇到安装deepspeed的时候报错,参考如下博客:

[linux] No such file or directory ‘:/usr/local/cuda/bin/nvcc‘_心心喵的博客-CSDN博客

2、Reward Model

pip install transformers --use-feature=2020-resolver
pip install datasets
pip install -r requirements.txt
# Move into the second step of the pipeline
cd training/step2_reward_model_finetuning

# Run the training script
bash training_scripts/s

猜你喜欢

转载自blog.csdn.net/Trance95/article/details/130427226
今日推荐