[AI Combat] Build Chinese LLaMA-33B language model Chinese-LLaMA-Alpaca-33B from scratch

Introduction

On February 25, 2023, Meta launched a new large-scale language model based on artificial intelligence (AI) for the research community, joining Microsoft, Google and other companies stimulated by ChatGPT to join the artificial intelligence competition.

Meta's LLaMA is an acronym for "Large Language Model Meta AI" (Large Language Model Meta AI), which is available under a non-commercial license to researchers and practitioners in government, community and academia.

Open-source models include LLaMA with parameters (7B, 13B, 33B, and 65B). Among them, LLaMA 65B and LLaMA 33B are trained on 1.4 trillion tokens, and the smallest model LLaMA 7B is also trained on 1 trillion tokens.

Like other large language models, LLaMA works by taking a sequence of words as "input" and predicting the next word to recursively generate text. For this set of models, Meta selected text for training from the 20 most spoken languages, with a focus on Latin and Cyrillic.

This article focuses on the complete process of building from scratch based on the LLaMA-33B language model Chinese-LLaMA-Alpaca-33B.

Environment configuration

Environment build

  • system environment

    • Ubuntu 20.04LTS
    • NVIDIA TESLA P40
    • CUDA 11.7
    • cuDNN 8
    • Docker 18.09.5
  • Create docker container

    Pull the docker image

    docker pull nvcr.io/nvidia/pytorch:21.08-py3
    

    create docker

    nvidia-docker run -it -d \
        --name llama \
        -v /llm:/notebooks \
        -p 28888:8888 \
        -p 28889:8889 \
        -e TZ='Asia/Shanghai' \
        --shm-size 16G \
        nvcr.io/nvidia/pytorch:21.08-py3
    

    Modify /llm to your own path

    Into the container:

    docker exec -it llama  env LANG=C.UTF-8 /bin/bash
    
  • install conda

    download:

    cd /notebooks
    wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
    

    Install:

    bash Miniconda3-latest-Linux-x86_64.sh
    

    Install and install

    Add miniconda to the PATH path:

    export PATH="/root/miniconda3/bin:$PATH"
    

    Create a conda environment:

    conda create -n llama_30b python=3.10.9
    
  • Install dependent libraries

    conda activate llama_30b
    conda init
    

    exit out of docker and re-enter docker

    docker exec -it llama  env LANG=C.UTF-8 /bin/bash
    cd /notebooks
    conda activate llama_30b
    
  • memory requirements

    insert image description here

Dependency installation

Please install the specified version, otherwise the SHA256 check value cannot be compared after merging:

pip install torch==1.13.1 -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install torchvision==0.14.1  -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install torchaudio==0.13.1 -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install transformers==4.28.1 -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install sentencepiece==0.1.97 -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install peft==0.3.0 -i https://pypi.tuna.tsinghua.edu.cn/simple

Code and model weight pull

Pull Chinese-LLaMA-Alpaca

git clone https://github.com/ymcui/Chinese-LLaMA-Alpaca.git

The github website occasionally has convulsions, so you need to wait patiently. If it fails, execute rm -rf Chinese-LLaMA-Alpaca, and then pull it again

Pull llama-30b-hf model weight and code

git clone https://huggingface.co/decapoda-research/llama-30b-hf

Since the weight file is very large, if it fails, execute rm -rf llama-30b-hf and pull it again.
It is recommended to pull at noon, the speed is relatively fast, about 2-3 hours (it has a lot to do with your network bandwidth!).

File size view:

du -sh llama-30b-hf

output:

154G    llama-30b-hf

View file list:

ls -l llama-30b-hf/

output:

total 80723436
-rw-r--r-- 1 root root      10646 Jul  4 11:59 LICENSE
-rw-r--r-- 1 root root       8313 Jul  4 11:59 README.md
-rw-r--r-- 1 root root        427 Jul  4 11:59 config.json
-rw-r--r-- 1 root root        124 Jul  4 11:59 generation_config.json
-rw-r--r-- 1 root root 1337620210 Jul  4 13:53 pytorch_model-00000-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul  4 12:15 pytorch_model-00001-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul  4 12:15 pytorch_model-00002-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul  4 12:14 pytorch_model-00003-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul  4 12:15 pytorch_model-00004-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul  4 12:14 pytorch_model-00005-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul  4 12:14 pytorch_model-00006-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul  4 12:15 pytorch_model-00007-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul  4 12:14 pytorch_model-00008-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul  4 12:29 pytorch_model-00009-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul  4 12:29 pytorch_model-00010-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul  4 12:29 pytorch_model-00011-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul  4 12:30 pytorch_model-00012-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul  4 12:29 pytorch_model-00013-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul  4 12:29 pytorch_model-00014-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul  4 12:29 pytorch_model-00015-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul  4 12:30 pytorch_model-00016-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul  4 12:44 pytorch_model-00017-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul  4 12:44 pytorch_model-00018-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul  4 12:44 pytorch_model-00019-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul  4 12:45 pytorch_model-00020-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul  4 12:44 pytorch_model-00021-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul  4 12:44 pytorch_model-00022-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul  4 12:44 pytorch_model-00023-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul  4 12:45 pytorch_model-00024-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul  4 12:59 pytorch_model-00025-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul  4 12:59 pytorch_model-00026-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul  4 12:59 pytorch_model-00027-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul  4 12:59 pytorch_model-00028-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul  4 12:59 pytorch_model-00029-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul  4 13:00 pytorch_model-00030-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul  4 12:59 pytorch_model-00031-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul  4 13:00 pytorch_model-00032-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul  4 13:14 pytorch_model-00033-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul  4 13:14 pytorch_model-00034-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul  4 13:14 pytorch_model-00035-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul  4 13:14 pytorch_model-00036-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul  4 13:14 pytorch_model-00037-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul  4 13:14 pytorch_model-00038-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul  4 13:15 pytorch_model-00039-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul  4 13:15 pytorch_model-00040-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul  4 13:16 pytorch_model-00041-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul  4 13:29 pytorch_model-00042-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul  4 13:53 pytorch_model-00043-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul  4 13:29 pytorch_model-00044-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul  4 13:29 pytorch_model-00045-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul  4 13:53 pytorch_model-00046-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul  4 13:29 pytorch_model-00047-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul  4 13:48 pytorch_model-00048-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul  4 13:29 pytorch_model-00049-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul  4 13:29 pytorch_model-00050-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul  4 13:31 pytorch_model-00051-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul  4 13:44 pytorch_model-00052-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul  4 13:44 pytorch_model-00053-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul  4 13:44 pytorch_model-00054-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul  4 13:44 pytorch_model-00055-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul  4 13:44 pytorch_model-00056-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul  4 13:44 pytorch_model-00057-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul  4 13:44 pytorch_model-00058-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul  4 13:45 pytorch_model-00059-of-00061.bin
-rw-r--r-- 1 root root 1337620210 Jul  4 13:29 pytorch_model-00060-of-00061.bin
-rw-r--r-- 1 root root 1064974796 Jul  4 13:47 pytorch_model-00061-of-00061.bin
-rw-r--r-- 1 root root      47653 Jul  4 11:59 pytorch_model.bin.index.json
-rw-r--r-- 1 root root          2 Jul  4 11:59 special_tokens_map.json
-rw-r--r-- 1 root root     499723 Jul  4 13:44 tokenizer.model
-rw-r--r-- 1 root root        141 Jul  4 11:59 tokenizer_config.json

Pull the Chinese-llama-lora-33b model weight and code

git clone https://huggingface.co/ziqingyang/chinese-llama-lora-33b

File size view:

du -sh chinese-llama-lora-33b

output:

5.5G    chinese-llama-lora-33b

View file list:

ls -l chinese-llama-lora-33b

output:

total 2836532
-rw-r--r-- 1 root root        315 Jul  4 15:46 README.md
-rw-r--r-- 1 root root        421 Jul  4 15:46 adapter_config.json
-rw-r--r-- 1 root root 2903823997 Jul  4 15:51 adapter_model.bin
-rw-r--r-- 1 root root         72 Jul  4 15:46 special_tokens_map.json
-rw-r--r-- 1 root root     757958 Jul  4 15:46 tokenizer.model
-rw-r--r-- 1 root root        166 Jul  4 15:46 tokenizer_config.json

Combine model weights

Convert the model weight of pth type first, and verify the model weight

cd ./Chinese-LLaMA-Alpaca
mkdir ./Chinese-LLaMA-33B
python scripts/merge_llama_with_chinese_lora.py \
    --base_model ../llama-30b-hf/ \
    --lora_model ../chinese-llama-lora-33b/ \
    --output_type pth  \
    --output_dir ./Chinese-LLaMA-33B-pth

The output model weight file is saved to: ./Chinese-LLaMA-33B-pth

Check SHA256 after merge

Generate SHA256

cd ./Chinese-LLaMA-33B-pth
sha256sum consolidated.0*

output:

054e9b7dffa3b92a053ca32acac6e22b27c184ed2b8563f8e44e6570ba416357  consolidated.00.pth
a0fe86c45a0819f45a509776d82778b7de75fbff8d37afa97159b24de5448b7b  consolidated.01.pth
13df5f74dc7bc1204076b1febef818fb3cec978de27bf8fc85c70e7d62282df9  consolidated.02.pth
f4f28106c343c5804613faa9852f29fbc60764366bcb0d37ef2811a17be2d336  consolidated.03.pth

The following is the SHA256 of the Chinese-LLaMA-33B standard

054e9b7dffa3b92a053ca32acac6e22b27c184ed2b8563f8e44e6570ba416357
a0fe86c45a0819f45a509776d82778b7de75fbff8d37afa97159b24de5448b7b
13df5f74dc7bc1204076b1febef818fb3cec978de27bf8fc85c70e7d62282df9
f4f28106c343c5804613faa9852f29fbc60764366bcb0d37ef2811a17be2d336

If the two are completely consistent, the merge is successful; otherwise, check whether the downloaded data is complete and consistent.

Then merge the model weights of the huggingface type

cd ./Chinese-LLaMA-Alpaca
mkdir ./Chinese-LLaMA-33B
python scripts/merge_llama_with_chinese_lora.py \
    --base_model ../llama-30b-hf/ \
    --lora_model ../chinese-llama-lora-33b/ \
    --output_type huggingface  \
    --output_dir ./Chinese-LLaMA-33B-2

The output model weight file is saved to: ./Chinese-LLaMA-33B

total 77G
-rw-r--r-- 1 root root  573 Jul  5 02:15 config.json
-rw-r--r-- 1 root root  132 Jul  5 02:15 generation_config.json
-rw-r--r-- 1 root root  12G Jul  5 02:15 pytorch_model-00001-of-00007.bin
-rw-r--r-- 1 root root  12G Jul  5 02:16 pytorch_model-00002-of-00007.bin
-rw-r--r-- 1 root root  12G Jul  5 02:16 pytorch_model-00003-of-00007.bin
-rw-r--r-- 1 root root  12G Jul  5 02:18 pytorch_model-00004-of-00007.bin
-rw-r--r-- 1 root root  12G Jul  5 02:19 pytorch_model-00005-of-00007.bin
-rw-r--r-- 1 root root  12G Jul  5 02:20 pytorch_model-00006-of-00007.bin
-rw-r--r-- 1 root root 7.6G Jul  5 02:21 pytorch_model-00007-of-00007.bin
-rw-r--r-- 1 root root  49K Jul  5 02:21 pytorch_model.bin.index.json
-rw-r--r-- 1 root root   72 Jul  5 02:15 special_tokens_map.json
-rw-r--r-- 1 root root 741K Jul  5 02:15 tokenizer.model
-rw-r--r-- 1 root root  727 Jul  5 02:15 tokenizer_config.json

Build a test page

Use text generation webui to build pages

Pull text-generation-webui

git clone https://github.com/oobabooga/text-generation-webui
cd text-generation-webui
pip install -r requirements.txt

Among them, requirements.txt:

accelerate==0.20.3
colorama
datasets
einops
flexgen==0.1.7
gradio_client==0.2.5
gradio==3.33.1
markdown
numpy
pandas
Pillow>=9.5.0
pyyaml
requests
safetensors==0.3.1
sentencepiece
tqdm
scipy

Load the model and start the webui

mkdir logs

python server.py --model-dir /notebooks/Chinese-LLaMA-Alpaca --model Chinese-LLaMA-33B --model_type LLaMA --listen --listen-host 0.0.0.0 --listen-port 8888 --auto-devices 
  • test

    Address: http://10.192.xx:28888/

    Interface screenshot

  • My inference speed:
    Output generated in 832.65 seconds (0.09 tokens/s, 73 tokens, context 6, seed 233442323)

reference

https://github.com/ymcui/Chinese-LLaMA-Alpaca
uses text-generation-webui to build the interface
https://github.com/ymcui/Chinese-LLaMA-Alpaca/blob/main/SHA256.md

Guess you like

Origin blog.csdn.net/zengNLP/article/details/131556190