[AI Combat] Build Chinese LLaMA-33B language model Chinese-LLaMA-Alpaca-33B from scratch
Introduction
On February 25, 2023, Meta launched a new large-scale language model based on artificial intelligence (AI) for the research community, joining Microsoft, Google and other companies stimulated by ChatGPT to join the artificial intelligence competition.
Meta's LLaMA is an acronym for "Large Language Model Meta AI" (Large Language Model Meta AI), which is available under a non-commercial license to researchers and practitioners in government, community and academia.
Open-source models include LLaMA with parameters (7B, 13B, 33B, and 65B). Among them, LLaMA 65B and LLaMA 33B are trained on 1.4 trillion tokens, and the smallest model LLaMA 7B is also trained on 1 trillion tokens.
Like other large language models, LLaMA works by taking a sequence of words as "input" and predicting the next word to recursively generate text. For this set of models, Meta selected text for training from the 20 most spoken languages, with a focus on Latin and Cyrillic.
This article focuses on the complete process of building from scratch based on the LLaMA-33B language model Chinese-LLaMA-Alpaca-33B.
Environment configuration
Environment build
-
system environment
- Ubuntu 20.04LTS
- NVIDIA TESLA P40
- CUDA 11.7
- cuDNN 8
- Docker 18.09.5
-
Create docker container
Pull the docker image
docker pull nvcr.io/nvidia/pytorch:21.08-py3
create docker
nvidia-docker run -it -d \ --name llama \ -v /llm:/notebooks \ -p 28888:8888 \ -p 28889:8889 \ -e TZ='Asia/Shanghai' \ --shm-size 16G \ nvcr.io/nvidia/pytorch:21.08-py3
Modify /llm to your own path
Into the container:
docker exec -it llama env LANG=C.UTF-8 /bin/bash
-
install conda
download:
cd /notebooks wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
Install:
bash Miniconda3-latest-Linux-x86_64.sh
Install and install
Add miniconda to the PATH path:
export PATH="/root/miniconda3/bin:$PATH"
Create a conda environment:
conda create -n llama_30b python=3.10.9
-
Install dependent libraries
conda activate llama_30b conda init
exit out of docker and re-enter docker
docker exec -it llama env LANG=C.UTF-8 /bin/bash cd /notebooks conda activate llama_30b
-
memory requirements
Dependency installation
Please install the specified version, otherwise the SHA256 check value cannot be compared after merging:
pip install torch==1.13.1 -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install torchvision==0.14.1 -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install torchaudio==0.13.1 -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install transformers==4.28.1 -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install sentencepiece==0.1.97 -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install peft==0.3.0 -i https://pypi.tuna.tsinghua.edu.cn/simple
Code and model weight pull
Pull Chinese-LLaMA-Alpaca
git clone https://github.com/ymcui/Chinese-LLaMA-Alpaca.git
The github website occasionally has convulsions, so you need to wait patiently. If it fails, execute rm -rf Chinese-LLaMA-Alpaca, and then pull it again
Pull llama-30b-hf model weight and code
git clone https://huggingface.co/decapoda-research/llama-30b-hf
Since the weight file is very large, if it fails, execute rm -rf llama-30b-hf and pull it again.
It is recommended to pull at noon, the speed is relatively fast, about 2-3 hours (it has a lot to do with your network bandwidth!).
File size view:
du -sh llama-30b-hf
output:
154G llama-30b-hf
View file list:
ls -l llama-30b-hf/
output:
total 80723436
-rw-r--r-- 1 root root 10646 Jul 4 11:59 LICENSE
-rw-r--r-- 1 root root 8313 Jul 4 11:59 README.md
-rw-r--r-- 1 root root 427 Jul 4 11:59 config.json
-rw-r--r-- 1 root root 124 Jul 4 11:59 generation_config.json
-rw-r--r-- 1 root root 1337620210 Jul 4 13:53 pytorch_model-00000-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul 4 12:15 pytorch_model-00001-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul 4 12:15 pytorch_model-00002-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul 4 12:14 pytorch_model-00003-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul 4 12:15 pytorch_model-00004-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul 4 12:14 pytorch_model-00005-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul 4 12:14 pytorch_model-00006-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul 4 12:15 pytorch_model-00007-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul 4 12:14 pytorch_model-00008-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul 4 12:29 pytorch_model-00009-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul 4 12:29 pytorch_model-00010-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul 4 12:29 pytorch_model-00011-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul 4 12:30 pytorch_model-00012-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul 4 12:29 pytorch_model-00013-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul 4 12:29 pytorch_model-00014-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul 4 12:29 pytorch_model-00015-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul 4 12:30 pytorch_model-00016-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul 4 12:44 pytorch_model-00017-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul 4 12:44 pytorch_model-00018-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul 4 12:44 pytorch_model-00019-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul 4 12:45 pytorch_model-00020-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul 4 12:44 pytorch_model-00021-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul 4 12:44 pytorch_model-00022-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul 4 12:44 pytorch_model-00023-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul 4 12:45 pytorch_model-00024-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul 4 12:59 pytorch_model-00025-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul 4 12:59 pytorch_model-00026-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul 4 12:59 pytorch_model-00027-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul 4 12:59 pytorch_model-00028-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul 4 12:59 pytorch_model-00029-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul 4 13:00 pytorch_model-00030-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul 4 12:59 pytorch_model-00031-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul 4 13:00 pytorch_model-00032-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul 4 13:14 pytorch_model-00033-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul 4 13:14 pytorch_model-00034-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul 4 13:14 pytorch_model-00035-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul 4 13:14 pytorch_model-00036-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul 4 13:14 pytorch_model-00037-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul 4 13:14 pytorch_model-00038-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul 4 13:15 pytorch_model-00039-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul 4 13:15 pytorch_model-00040-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul 4 13:16 pytorch_model-00041-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul 4 13:29 pytorch_model-00042-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul 4 13:53 pytorch_model-00043-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul 4 13:29 pytorch_model-00044-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul 4 13:29 pytorch_model-00045-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul 4 13:53 pytorch_model-00046-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul 4 13:29 pytorch_model-00047-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul 4 13:48 pytorch_model-00048-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul 4 13:29 pytorch_model-00049-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul 4 13:29 pytorch_model-00050-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul 4 13:31 pytorch_model-00051-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul 4 13:44 pytorch_model-00052-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul 4 13:44 pytorch_model-00053-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul 4 13:44 pytorch_model-00054-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul 4 13:44 pytorch_model-00055-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul 4 13:44 pytorch_model-00056-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul 4 13:44 pytorch_model-00057-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul 4 13:44 pytorch_model-00058-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul 4 13:45 pytorch_model-00059-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul 4 13:29 pytorch_model-00060-of-00061.bin
-rw-r--r-- 1 root root 1064974796 Jul 4 13:47 pytorch_model-00061-of-00061.bin
-rw-r--r-- 1 root root 47653 Jul 4 11:59 pytorch_model.bin.index.json
-rw-r--r-- 1 root root 2 Jul 4 11:59 special_tokens_map.json
-rw-r--r-- 1 root root 499723 Jul 4 13:44 tokenizer.model
-rw-r--r-- 1 root root 141 Jul 4 11:59 tokenizer_config.json
Pull the Chinese-llama-lora-33b model weight and code
git clone https://huggingface.co/ziqingyang/chinese-llama-lora-33b
File size view:
du -sh chinese-llama-lora-33b
output:
5.5G chinese-llama-lora-33b
View file list:
ls -l chinese-llama-lora-33b
output:
total 2836532
-rw-r--r-- 1 root root 315 Jul 4 15:46 README.md
-rw-r--r-- 1 root root 421 Jul 4 15:46 adapter_config.json
-rw-r--r-- 1 root root 2903823997 Jul 4 15:51 adapter_model.bin
-rw-r--r-- 1 root root 72 Jul 4 15:46 special_tokens_map.json
-rw-r--r-- 1 root root 757958 Jul 4 15:46 tokenizer.model
-rw-r--r-- 1 root root 166 Jul 4 15:46 tokenizer_config.json
Combine model weights
Convert the model weight of pth type first, and verify the model weight
cd ./Chinese-LLaMA-Alpaca
mkdir ./Chinese-LLaMA-33B
python scripts/merge_llama_with_chinese_lora.py \
--base_model ../llama-30b-hf/ \
--lora_model ../chinese-llama-lora-33b/ \
--output_type pth \
--output_dir ./Chinese-LLaMA-33B-pth
The output model weight file is saved to: ./Chinese-LLaMA-33B-pth
Check SHA256 after merge
Generate SHA256
cd ./Chinese-LLaMA-33B-pth
sha256sum consolidated.0*
output:
054e9b7dffa3b92a053ca32acac6e22b27c184ed2b8563f8e44e6570ba416357 consolidated.00.pth
a0fe86c45a0819f45a509776d82778b7de75fbff8d37afa97159b24de5448b7b consolidated.01.pth
13df5f74dc7bc1204076b1febef818fb3cec978de27bf8fc85c70e7d62282df9 consolidated.02.pth
f4f28106c343c5804613faa9852f29fbc60764366bcb0d37ef2811a17be2d336 consolidated.03.pth
The following is the SHA256 of the Chinese-LLaMA-33B standard
054e9b7dffa3b92a053ca32acac6e22b27c184ed2b8563f8e44e6570ba416357
a0fe86c45a0819f45a509776d82778b7de75fbff8d37afa97159b24de5448b7b
13df5f74dc7bc1204076b1febef818fb3cec978de27bf8fc85c70e7d62282df9
f4f28106c343c5804613faa9852f29fbc60764366bcb0d37ef2811a17be2d336
If the two are completely consistent, the merge is successful; otherwise, check whether the downloaded data is complete and consistent.
Then merge the model weights of the huggingface type
cd ./Chinese-LLaMA-Alpaca
mkdir ./Chinese-LLaMA-33B
python scripts/merge_llama_with_chinese_lora.py \
--base_model ../llama-30b-hf/ \
--lora_model ../chinese-llama-lora-33b/ \
--output_type huggingface \
--output_dir ./Chinese-LLaMA-33B-2
The output model weight file is saved to: ./Chinese-LLaMA-33B
total 77G
-rw-r--r-- 1 root root 573 Jul 5 02:15 config.json
-rw-r--r-- 1 root root 132 Jul 5 02:15 generation_config.json
-rw-r--r-- 1 root root 12G Jul 5 02:15 pytorch_model-00001-of-00007.bin
-rw-r--r-- 1 root root 12G Jul 5 02:16 pytorch_model-00002-of-00007.bin
-rw-r--r-- 1 root root 12G Jul 5 02:16 pytorch_model-00003-of-00007.bin
-rw-r--r-- 1 root root 12G Jul 5 02:18 pytorch_model-00004-of-00007.bin
-rw-r--r-- 1 root root 12G Jul 5 02:19 pytorch_model-00005-of-00007.bin
-rw-r--r-- 1 root root 12G Jul 5 02:20 pytorch_model-00006-of-00007.bin
-rw-r--r-- 1 root root 7.6G Jul 5 02:21 pytorch_model-00007-of-00007.bin
-rw-r--r-- 1 root root 49K Jul 5 02:21 pytorch_model.bin.index.json
-rw-r--r-- 1 root root 72 Jul 5 02:15 special_tokens_map.json
-rw-r--r-- 1 root root 741K Jul 5 02:15 tokenizer.model
-rw-r--r-- 1 root root 727 Jul 5 02:15 tokenizer_config.json
Build a test page
Use text generation webui to build pages
Pull text-generation-webui
git clone https://github.com/oobabooga/text-generation-webui
cd text-generation-webui
pip install -r requirements.txt
Among them, requirements.txt:
accelerate==0.20.3
colorama
datasets
einops
flexgen==0.1.7
gradio_client==0.2.5
gradio==3.33.1
markdown
numpy
pandas
Pillow>=9.5.0
pyyaml
requests
safetensors==0.3.1
sentencepiece
tqdm
scipy
Load the model and start the webui
mkdir logs
python server.py --model-dir /notebooks/Chinese-LLaMA-Alpaca --model Chinese-LLaMA-33B --model_type LLaMA --listen --listen-host 0.0.0.0 --listen-port 8888 --auto-devices
-
test
Address: http://10.192.xx:28888/
-
My inference speed:
Output generated in 832.65 seconds (0.09 tokens/s, 73 tokens, context 6, seed 233442323)
reference
https://github.com/ymcui/Chinese-LLaMA-Alpaca
uses text-generation-webui to build the interface
https://github.com/ymcui/Chinese-LLaMA-Alpaca/blob/main/SHA256.md