I believe that many people have read the article about using colossal-AI to reproduce the process of Chatgpt, but in fact, some people have sighed "It's very clear, but I can't do it myself". I will disclose the actual combat process and give a reference to friends who are interested in reproducing the chatgpt process.
1. Environment construction:
1. Tencent Cloud purchased a P40 GPU server (T4 GPU 16G can't run, OOM), the server is a 24G GPU, and the software environment is: ubutun18.04+torch1.9
2. Use git clone to download the source code of colossalAI, and use https://ghproxy.com for acceleration.
git clone https://ghproxy.com/https://github.com/hpcaitech/ColossalAI
3. Build the nvidia docker operating environment.
4. Use the nvidia image (such as: nvcr.io/nvidia/pytorch:22.05-py3), note: the image of hpcaitech/colossalai:0.2.5 cannot use the --gpus parameter.
sudo docker pull nvcr.io/nvidia/pytorch:22.05-py3
5. Enter ColosaalAI and create a gpt container
sudo docker run --name gpt --gpus=all --ipc=host --rm -it -v $PWD:/gpt -p 6006 -p 8888 --ulimit memlock=-1 -v /etc/localtime:/etc/localtime:ro -d nvcr.io/nvidia/pytorch:22.05-py3
6. Enter the gpt container
sudo docker exec -it gpt /bin/bash
7. Install chatgpt and its dependencies under /gpt/applications/ChatGPT in the container, and use Douban to accelerate the source.
pip install . -i https://pypi.douban.com/simple
2. Start training prompt data
1. Enter the example directory, download prompts.csv , and start training with prompts.
python train_prompts.py prompts.csv --strategy naive
2. After the training is completed, two model files are generated, and the GPU occupies about 9 G.
3. Start training the reward model
1. Install git-lfs in the host service (non-docker environment) to manage large files in the model.
curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash
sudo apt-get install git-lfs
git lfs install
2. Go to the ColossalAI\applications\ChatGPT\examples directory of the host server and download bloom-560m.
git clone https://huggingface.co/bigscience/bloom-560m
This time is relatively long, you need to download more than 3 G model files, please wait patiently.
3. Train the reward model in the gpt container
python train_reward_model.py --pretrain bloom-560m
4. Run benckmark
Please refer to ReadMe.md to proceed.
5. Description
Among them, prompts with dummy suffix are randomly generated prompts, prompts with prompts use prompts.csv, fine-tuning training uses the gpt2 model, reward model training uses the bloom model, benckmark uses the opt model, and there is currently no complete In series, you can realize it according to your own understanding. That is, first use the prompt method to use a large model (such as: gpt2/bloom/opt) to fine-tune an actor model, then use the labeled data with human feedback to train a reward model, and then use the reward model to train the actor model to obtain a human-like model. The model of feedback evaluation.