No need for GPU and no network "local deployment chatGPT" (updated Chinese model)

I think it took two or three years for image generation to go from DELL to stable diffusion and then to Apple’s mobile deployment.
The chat bot can be deployed on the edge after only a few months of development. If Apple updates the silicon, the npu and memory will be doubled, and the apple watch will also win. Can be built-in locally, the fastest ios18 mac, ipad, and iPhone can be built-in
. It is an open source project that ordinary people are happy with. A model of the level of chatGPT can even be deployed to run on the Raspberry Pi, and then encounter problems during the operation. When it comes to some problems, this article is the process of recording the number of steps.
Recently, github has been updated, and the problems I encountered when stepping on the pit are no longer painful to use, but I don’t have time to study this for the time being, so I just do it to the end and paste the unupdated code below.
It has been updated for the latest version of github (2023.4.7), so you can eat it without worrying about downloading the old code link below.

Link: https://pan.baidu.com/s/1J9FBxSDhmBcqAnHx3rGhEQ Extraction code: q5xv
– Shared from Baidu Netdisk super member v6
Then with the following model Baidu cloud link, you should be able to build your own language model.

Big Brother’s URL: https://github.com/ggerganov/llama.cpp

download and generate

Open the command line and enter the following command

git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
make

#对于Windows和CMake,使用下面的方法构建:
cd <path_to_llama_folder>
mkdir build
cd build
cmake ..
cmake --build . --config Release

insert image description here

Model download

I think the model download is the most troublesome, fortunately someone else gave it

git clone https://huggingface.co/nyanko7/LLaMA-7B

Well, I will directly give Baidu Cloud
link: https://pan.baidu.com/s/1ZC2SCG9X8jZ-GysavQl29Q Extraction code: 4ret
– share from Baidu Netdisk super member v6

insert image description here

Then install the python dependencies, and convert the model to FP16 format. Then the first little bug will appear.

python3 -m pip install torch numpy sentencepiece

# convert the 7B model to ggml FP16 format
python3 convert-pth-to-ggml.py models/7B/ 1

insert image description here

He will report that the file cannot be found.
insert image description here
Open the convert-pth-to-ggml.py file, modify the path of "/tokenizer.model", and run it again python3 convert-pth-to-gaml.py ./models/7B 1. I also changed the name by the way.
insert image description here

The file was found, and then the second bug appeared. . . . .
insert image description here

I couldn't find the problem at first, but after comparing the original URL and the files in the 7B folder, I found that the file sizes were completely different. I said why git is like this for dozens of gigabytes.
Open the URL in the picture below, and click the two downloads in the red box. Replace those two files in the 7B folder.
insert image description here

insert image description here

Reconvert the model to 4-bit format

# quantize the model to 4-bits
./quantize ./models/7B/ggml-model-f16.bin ./models/7B/ggml-model-q4_0.bin 2

insert image description here

reasoning

# run the inference
./main -m ./models/7B/ggml-model-q4_0.bin -n 128

insert image description here

If you want to talk like chatGPT, use the following command, -n controls the maximum length of reply generation, --color is the color to distinguish between ai and human, -i is used as a parameter to run in interactive mode, -r is a reverse prompt, -f is a whole paragraph of prompts, --repeat_penalty controls the penalty for repeated text in generated replies, --temp temperature coefficient, the lower the value, the less random the reply is, and vice versa.
It's much faster after the update.

./main -m ./models/7B/ggml-model-q4_0.bin -n 256 --repeat_penalty 1.0 --color -i -r "User:" -f prompts/chat-with-bob.txt

Let's open prompts/chat-with-bob.txt and have a look.
insert image description here

We can see that this is equivalent to giving the AI ​​model a scene topic, and then you and the AI ​​can chat on this topic.

My English name is zale, and then I call this robot kangaroo, and chat with him in this identity, you can modify the code below according to your own preferences.

./main -m ./models/7B/ggml-model-q4_0.bin -n 256 --repeat_penalty 1.0 --color -i -r "Zale:" \

write a txt file

"Transcript of a dialog, where the Zale interacts with an Assistant named Kangaroo. Kangaroo is helpful, kind, honest, good at writing, and never fails to answer the Zale's requests immediately and with precision.

Zale: Hello, Kangaroo.
Kangaroo: Hello. How may I help you today?
Zale: Please tell me the largest city in Europe.
Kangaroo: Sure. The largest city in Europe is Moscow, the capital of Russia.
Zale:"

insert image description here

It's a bit dull, but it's a huge improvement for edge deployment!
A very interesting discovery, I can understand Chinese but tell me that I can't understand Chinese. . . . .
insert image description here
share an interesting conversation
insert image description here

Chinese deployment

Harbin Institute of Technology's github
https://github.com/ymcui/Chinese-LLaMA-Alpaca

git clone https://github.com/ymcui/Chinese-LLaMA-Alpaca.git

Download the Chinese model, but this is not the model to be input by llama.cpp. The official description is the Chinese patch model of llama, which needs to be merged with the original llama/alpaca model before it can be used.
insert image description here

install dependencies

pip install git+https://github.com/huggingface/transformers
pip install sentencepiece
pip install peft

For convenience, I also put the original llama file here
insert image description here

there are some caveats
insert image description here

View sha256, the viewing method of each platform is slightly different, you can search online how to view sha256
insert image description here

Organize the path of the original file of llama
insert image description here

I downloaded transformers to conda, the path is a bit long. You just find the path to your convert_llama_weights_to_hf.py file.

python /Users/kangaroo/miniconda3/envs/pytorch/lib/python3.10/site-packages/transformers/models/llama/convert_llama_weights_to_hf.py \
    --input_dir ./llama_7b \
    --model_size 7B \      
    --output_dir ./llama_hf

insert image description here

merged model

python scripts/merge_llama_with_chinese_lora.py \
    --base_model ./llama_hf \                   
    --lora_model ./chinese_llama_lora_7b \             
    --output_dir ./cn_llama 

insert image description here

Then copy this folder to llama.cpp/models
insert image description here

Go back to llama.cpp to re-quantize

python convert-pth-to-ggml.py models/cn_llama/ 1

./quantize ./models/cn_llama/ggml-model-f16.bin ./models/cn_llama/ggml-model-q4_0.bin 2

I'm a bit talkative, so I just cut it off, and I'll look at it later

./main -m ./models/cn_llama/ggml-model-q4_0.bin -n 48 --repeat_penalty 1.0 --color -i -r "Zale:" -f prompts/chat-with-zale.txt

insert image description here

./main -m models/cn_llama/ggml-model-q4_0.bin --color -f ./prompts/alpaca.txt -ins -c 2048 --temp 0.2 -n 256 --repeat_penalty 1.3

insert image description here

Guess you like

Origin blog.csdn.net/weixin_45569617/article/details/129553293