Reference:
https://www.listera.top/ji-xu-zhe-teng-xia-chinese-llama-alpaca/
https://blog.csdn.net/qq_38238956/article/details/130113599
cmake windows installation reference: https://blog.csdn.net/weixin_42357472/article/details/131314105
llama.cpp download and compile
1. Download:
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
2. Compile
mkdir build
cd build
cmake ..
cmake --build . --config Release
3. Test run
cd bin\Release
./main -h
Run the LLaMA-7B model test
Reference:
https://zhuanlan.zhihu.com/p/638427280
Model download:
https://huggingface.co/nyanko7/LLaMA-7B/tree/main
After downloading, create the LLamda\7B directory under llama.cpp-master\models\
1. convert the 7B model to ggml FP16 format
convert The .py file is under llama.cpp-master
python3 convert.py models/7B/
2. Quantize quantize the model to 4-bits (using q4_0 method)
quantize.exe under llama.cpp-master\build\bin\Release; the volume after quantization is about 13G to less than 4G
.\quantize.exe D:\llm\llama.cpp-master\models\LLamda\7B\ggml-model-f16.bin D:\llm\llama.cpp-master\models\LLamda\7B\ggml-model-q4_0.bin q4_0
3. Run
main.exe interactively on the command line under llama.cpp-master\build\bin\Release
.\main.exe -m D:\llm\llama.cpp-master\models\LLamda\7B\ggml-model-q4_0.bin -n 128 --repeat_penalty 1.0 --color -i -r "User:" -f D:\llm\llama.cpp-master\prompts\chat-with-bob.txt
LLaMA Chinese support is not very good, although I can roughly understand the meaning, if you need Chinese support, you may need to choose other models
You can also directly download the third-party converted ggml model, Llama-2
Reference address:
https://huggingface.co/TheBloke/Llama-2-13B-chat-GGML
https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGML
The running of windows consumes a lot of memory, 32g is basically full, and the generation speed is also very slow; but the 13b llama-2 generation model can directly reply to Chinese
##运行
.\main.exe -m "C:\Users\loong\Downloads\llama-2-13b-chat.ggmlv3.q4_0.bin" -n 128 --repeat_penalty 1.0 --color -i -r "User:" -f D:\llm\llama.cpp-master\prompts\chat-with-bob.txt
Chinese-Llama-2 Chinese second generation
Model download:
https://huggingface.co/soulteary/Chinese-Llama-2-7b-ggml-q4
##运行
.\main.exe -m "C:\Users\loong\Downloads\Chinese-Llama-2-7b-ggml-q4.bin" -n 128 --repeat_penalty 1.0 --color -i -r "User:" -f D:\llm\llama.cpp-master\prompts\chat-with-bob.txt