Actual combat: Teach you the process of colossal-AI reproducing Chatgpt

        I believe that many people have read the article about using colossal-AI to reproduce the process of Chatgpt, but in fact, some people have sighed "It's very clear, but I can't do it myself". I will disclose the actual combat process and give a reference to friends who are interested in reproducing the chatgpt process.

1. Environment construction:

1. Tencent Cloud purchased a P40 GPU server (T4 GPU 16G can't run, OOM), the server is a 24G GPU, and the software environment is: ubutun18.04+torch1.9

2. Use git clone to download the source code of colossalAI, and use https://ghproxy.com for acceleration.

git clone https://ghproxy.com/https://github.com/hpcaitech/ColossalAI

 3. Build the nvidia docker operating environment.

4. Use the nvidia image (such as: nvcr.io/nvidia/pytorch:22.05-py3), note: the image of hpcaitech/colossalai:0.2.5 cannot use the --gpus parameter.

sudo docker pull nvcr.io/nvidia/pytorch:22.05-py3

5. Enter ColosaalAI and create a gpt container

sudo docker run --name gpt --gpus=all --ipc=host --rm -it -v $PWD:/gpt -p 6006 -p 8888 --ulimit memlock=-1 -v /etc/localtime:/etc/localtime:ro -d nvcr.io/nvidia/pytorch:22.05-py3

6. Enter the gpt container

sudo docker exec -it gpt /bin/bash

7. Install chatgpt and its dependencies under /gpt/applications/ChatGPT in the container, and use Douban to accelerate the source.

pip install . -i https://pypi.douban.com/simple

2. Start training prompt data

1. Enter the example directory, download prompts.csv , and start training with prompts.

python train_prompts.py prompts.csv --strategy naive

2. After the training is completed, two model files are generated, and the GPU occupies about 9 G.

 3. Start training the reward model

1. Install git-lfs in the host service (non-docker environment) to manage large files in the model.

curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash
sudo apt-get install git-lfs
git lfs install

2. Go to the ColossalAI\applications\ChatGPT\examples directory of the host server and download bloom-560m.

git clone https://huggingface.co/bigscience/bloom-560m

This time is relatively long, you need to download more than 3 G model files, please wait patiently.

3. Train the reward model in the gpt container

python train_reward_model.py --pretrain bloom-560m

4. Run benckmark

 Please refer to ReadMe.md to proceed.

5. Description

        Among them, prompts with dummy suffix are randomly generated prompts, prompts with prompts use prompts.csv, fine-tuning training uses the gpt2 model, reward model training uses the bloom model, benckmark uses the opt model, and there is currently no complete In series, you can realize it according to your own understanding. That is, first use the prompt method to use a large model (such as: gpt2/bloom/opt) to fine-tune an actor model, then use the labeled data with human feedback to train a reward model, and then use the reward model to train the actor model to obtain a human-like model. The model of feedback evaluation.

Guess you like

Origin blog.csdn.net/wxl781227/article/details/129179414