Today I used a super easy-to-use Chatgpt model - ChatGLM, which can be easily deployed locally, and the effect is very good. After testing, the effect can basically replace the beta version of Wenxin Yiyan.
Table of contents
2.3.1 Enter directly in the command line for question and answer
2.3.2 Using the gradio library to generate a question-and-answer page
3. Comparison of the effect of the model with ChatGPT and GPT4AII
1. What is ChatGLM?
ChatGLM-6B is an open source dialogue language model that supports Chinese and English bilinguals. It is based on the General Language Model architecture and has 6.2 billion parameters. Combined with model quantization technology, users can deploy locally on consumer-grade graphics cards (only 6GB of video memory is required at the INT4 quantization level). ChatGLM-6B uses technology similar to ChatGPT, optimized for Chinese Q&A and dialogue. After about 1T identifiers of Chinese-English bilingual training, supplemented by supervision and fine-tuning, feedback self-help, human feedback reinforcement learning and other technologies, ChatGLM-6B with 6.2 billion parameters has been able to generate answers that are quite in line with human preferences.
2. Local deployment
2.1 Model download
Demo download address:
2.2 Model Deployment
1. Open the project file with Pycharm;
2. Use pip to install dependencies: pip install -r requirements.txt
, where transformers
the library version is recommended 4.27.1
, but theoretically not lower than 4.23.1
that;
2.3 Model running
There are two demo codes in the .../ChatGLM/ directory:
2.3.1 Enter directly in the command line for question and answer
(1) Modify the model path. Edit the cli_demo.py code, modify the model folder path in lines 5 and 6, and replace the original "THUDM/ChatGLM-6B" with "model" .
(2) Modify the quantized version. If your video memory is larger than 14G, you can skip this step without quantization. If your video memory is only 6G or 10G, you need to add quantize(4) or quantize(8) on line 6 of the code , as follows:
# 6G 显存可以 4 bit 量化
model = AutoModel.from_pretrained("model", trust_remote_code=True).half().quantize(4).cuda()
# 10G 显存可以 8 bit 量化
model = AutoModel.from_pretrained("model", trust_remote_code=True).half().quantize(8).cuda()
(3) Run cli_demo.py
2.3.2 Use the gradio library to generate a question-and-answer page
Use the gradio library to generate a question-and-answer page (the effect is shown in 3).
(1) Install the gradio library:
pip install gradio
(2) Modify the model path. Edit the cli_demo.py code, modify the model folder path in lines 5 and 6, and replace the original "THUDM/ChatGLM-6B" with "model" .
(3) Modify the quantized version. If your video memory is larger than 14G, you can skip this step without quantization. If your video memory is only 6G or 10G, you need to add quantize(4) or quantize(8) to the fifth line of code, as follows:
# 6G 显存可以 4 bit 量化
model = AutoModel.from_pretrained("model", trust_remote_code=True).half().quantize(4).cuda()
# 10G 显存可以 8 bit 量化
model = AutoModel.from_pretrained("model", trust_remote_code=True).half().quantize(8).cuda()
(4) Run web_demo.py
The model loading process is as follows:
3. Comparison of the effect of the model with ChatGPT and GPT4AII
After running web_demo.py, the browser will be opened automatically, and the following interface will be displayed, and normal dialogue can be carried out, and the response speed is very fast.
3.1 ChatGLM
Ask ChatGLM: "It takes 10 minutes to steam 1 bun, how long does it take to steam 10 buns?" The answer is very reasonable.
3.2 ChatGPT
Ask the same question to ChatGPT: "It takes 10 minutes to steam 1 bun, how long does it take to steam 10 buns?" The answer is a bit simpler.
3.3 GPT4AII
In the previous blog post, we introduced GPT4ALL, which can only realize English dialogue and ask related questions in English, and found that the effect is not as good as ChatGLM and ChatGPT.
Four. Summary
ChatGLM is easy to deploy, and has a good understanding of Chinese. Its advantage is that it does not need to be connected to the Internet after deployment, and does not require account login. It is very safe. Its disadvantage is that it cannot incrementally learn the latest information on the Internet, and the expansion of the knowledge base requires additional Training samples.