Chinese-LLaMA-Alpaca code combat


Project address: https://github.com/ymcui/Chinese-LLaMA-Alpaca

fine-tuning chinese-alpaca

This project is based on Chinese data

  • Open source the Chinese LLaMA large model (7B, 13B) pre-trained using Chinese text data
  • Open source the Chinese Alpaca large model (7B, 13B) that has been further fine-tuned by instructions

Use text-generation-webui to build an interface
Next, take the text-generation-webui tool as an example to introduce the detailed steps for localized deployment without merging models.
1. Create a new conda environment first.

conda create -n textgen python=3.10
conda activate textgen
pip install torch torchvision torchaudio

/2. Download Chinese-alpaca-lora-7b weight: https://drive.google.com/file/d/1JvFhBpekYiueWiUL3AF1TtaWDb3clY5D/view?usp=sharing

# 克隆text-generation-webui
git clone https://github.com/oobabooga/text-generation-webui
cd text-generation-webui
pip install -r requirements.txt

# 将下载后的lora权重放到loras文件夹下
ls loras/chinese-alpaca-lora-7b
adapter_config.json  adapter_model.bin  special_tokens_map.json  tokenizer_config.json  tokenizer.model

Three ways to download

  • Download the llama-7B model file in HuggingFace format through transformers-cli
transformers-cli download decapoda-research/llama-7b-hf --cache-dir ./llama-7b-hf
  • Download via snapshot_download:
pip install huggingface_hub
python
from huggingface_hub import snapshot_download
snapshot_download(repo_id="decapoda-research/llama-7b-hf", cache_dir="./llama-7b-hf")
  • Download through the git command (git-lfs needs to be installed in advance)
git clone https://huggingface.co/decapoda-research/llama-7b-hf

The second one I use here.

# 将HuggingFace格式的llama-7B模型文件放到models文件夹下
ls models/llama-7b-hf
pytorch_model-00001-of-00002.bin pytorch_model-00002-of-00002.bin config.json pytorch_model.bin.index.json generation_config.json
# 复制lora权重的tokenizer到models/llama-7b-hf下
cp loras/chinese-alpaca-lora-7b/tokenizer.model ~/text-generation-webui/models/llama-7b-hf/models--decapoda-research--llama-7b-hf/snapshots/5f98eefcc80e437ef68d457ad7bf167c2c6a1348/

cp loras/chinese-alpaca-lora-7b/special_tokens_map.json ~/text-generation-webui/models/llama-7b-hf/models--decapoda-research--llama-7b-hf/snapshots/5f98eefcc80e437ef68d457ad7bf167c2c6a1348/

cp loras/chinese-alpaca-lora-7b/tokenizer_config.json ~/text-generation-webui/models/llama-7b-hf/models--decapoda-research--llama-7b-hf/snapshots/5f98eefcc80e437ef68d457ad7bf167c2c6a1348/
# 修改/modules/LoRA.py文件,大约在第28行
shared.model.resize_token_embeddings(len(shared.tokenizer))
shared.model = PeftModel.from_pretrained(shared.model, Path(f"{shared.args.lora_dir}/{lora_names[0]}"), **params)
# 接下来就可以愉快的运行了,参考https://github.com/oobabooga/text-generation-webui/wiki/Using-LoRAs
# python server.py --model llama-7b-hf --lora chinese-alpaca-lora-7b
# 使用int8
python server.py --model llama-7b-hf/models--decapoda-research--llama-7b-hf/snapshots/5f98eefcc80e437ef68d457ad7bf167c2c6a1348/ --lora chinese-alpaca-lora-7b --load-in-8bit

报错
RuntimeError: Error(s) in loading state_dict for PeftModelForCausalLM:
size mismatch for base_model.model.model.embed_tokens.weight: copying a param with shape torch.Size([49954, 4096]) from checkpoint, the shape in current model is torch.Size([32000, 4096]).
size mismatch for base_model.model.lm_head.weight: copying a param with shape torch.Size([49954, 4096]) from checkpoint, the shape in current model is torch.Size([32000, 4096]).

Solve (replace with the following code):
shared.model.resize_token_embeddings(49954)
assert shared.model.get_input_embeddings().weight.size(0) == 49954
shared.model = PeftModel.from_pretrained(shared.model, Path(f "{shared.args.lora_dir}/{lora_names[0]}"), **params)

Open to the outside world under the setting
To create a public link, set share=Truein launch().

Experimental effect: the generated Chinese is shorter

Example:

below is an instruction rthat destribes a task.
write a response that appropriately conpletes the request.
### Instruction:
我得了流感,请帮我写一封请假条
### Response:

insert image description here

deploy llama.cpp

Download the combined model weights:

Download the combined model weights locally, and then upload them to the server.

# 下载项目
git clone https://github.com/ggerganov/llama.cpp
# 编译
cd llama.cpp && make
# 建一个文件夹
cd llama.cpp && mkdir zh-models && mkdir 7B

After putting all the files under alpaca-combined into the 7B directory, perform the following operations

mv llama.cpp/zh-models/7B/tokenizer.model llama.cpp/zh-models/
ls llama.cpp/zh-models/

Will display: 7B tokenizer.model

Execute the conversion process

python convert.py zh-models/7B/

Will generate ggml-model-f16.bin

Quantize FP16 model to 4-bit

We further convert the FP16 model to a 4-bit quantized model.

./quantize ./zh-models/7B/ggml-model-f16.bin ./zh-models/7B/ggml-model-q4_0.bin 2

available on demand

./main -m ./zh-models/7B/ggml-model-f16.bin --color -f ./prompts/alpaca.txt -p "详细介绍一下北京的名胜古迹:" -n 512

Guess you like

Origin blog.csdn.net/dzysunshine/article/details/130873188