Chinese version open source ChatGPT local deployment (super easy to use)!

Today I used a super easy-to-use Chatgpt model - ChatGLM, which can be easily deployed locally, and the effect is very good. After testing, the effect can basically replace the beta version of Wenxin Yiyan.

Table of contents

1. What is ChatGLM?

2. Local deployment

2.1 Model download

2.2 Model Deployment

2.3 Model running

2.3.1 Enter directly in the command line for question and answer

2.3.2 Using the gradio library to generate a question-and-answer page 

3. Comparison of the effect of the model with ChatGPT and GPT4AII

3.1 ChatGLM

3.2 ChatGPT

3.3 GPT4AII

Four. Summary


1. What is ChatGLM?

ChatGLM-6B is an open source dialogue language model that supports Chinese and English bilinguals. It is based on the  General Language Model   architecture and has 6.2 billion parameters. Combined with model quantization technology, users can deploy locally on consumer-grade graphics cards (only 6GB of video memory is required at the INT4 quantization level). ChatGLM-6B uses technology similar to ChatGPT, optimized for Chinese Q&A and dialogue. After about 1T identifiers of Chinese-English bilingual training, supplemented by supervision and fine-tuning, feedback self-help, human feedback reinforcement learning and other technologies, ChatGLM-6B with 6.2 billion parameters has been able to generate answers that are quite in line with human preferences.

2. Local deployment

2.1 Model download

Demo download address:

GitHub - ZhangErling/ChatGLM-6B: A version that provides Windows deployment documents | ChatGLM-6B: Open Source Bilingual Dialogue Language Model | An Open Bilingual Dialogue Language Model

2.2 Model Deployment

1. Open the project file with Pycharm;

2. Use pip to install dependencies: pip install -r requirements.txt, where  transformers the library version is recommended  4.27.1, but theoretically not lower than  4.23.1 that;

2.3 Model running

There are two demo codes in the  .../ChatGLM/  directory:

2.3.1 Enter directly in the command line for question and answer

(1) Modify the model path. Edit  the cli_demo.py  code, modify the model folder path in lines 5 and 6, and replace the original  "THUDM/ChatGLM-6B"  with  "model"  .

(2) Modify the quantized version. If your video memory is larger than 14G, you can skip this step without quantization. If your video memory is only 6G or 10G, you need to add quantize(4)  or  quantize(8) on line 6 of the code   , as follows:

# 6G 显存可以 4 bit 量化
model = AutoModel.from_pretrained("model", trust_remote_code=True).half().quantize(4).cuda()

# 10G 显存可以 8 bit 量化
model = AutoModel.from_pretrained("model", trust_remote_code=True).half().quantize(8).cuda()

(3) Run  cli_demo.py 

2.3.2 Use  the gradio  library to generate a question-and-answer page 

Use  the gradio  library to generate a question-and-answer page (the effect is shown in 3).

 (1) Install the gradio library:

pip install gradio

(2) Modify the model path. Edit  the cli_demo.py  code, modify the model folder path in lines 5 and 6, and replace the original  "THUDM/ChatGLM-6B"  with  "model"  .

(3) Modify the quantized version. If your video memory is larger than 14G, you can skip this step without quantization. If your video memory is only 6G or 10G, you need to add quantize(4) or quantize(8) to the fifth line of code, as follows:

# 6G 显存可以 4 bit 量化
model = AutoModel.from_pretrained("model", trust_remote_code=True).half().quantize(4).cuda()

# 10G 显存可以 8 bit 量化
model = AutoModel.from_pretrained("model", trust_remote_code=True).half().quantize(8).cuda()

(4) Run web_demo.py 

The model loading process is as follows:

3. Comparison of the effect of the model with ChatGPT and GPT4AII

After running web_demo.py, the browser will be opened automatically, and the following interface will be displayed, and normal dialogue can be carried out, and the response speed is very fast.

3.1 ChatGLM

Ask ChatGLM: "It takes 10 minutes to steam 1 bun, how long does it take to steam 10 buns?" The answer is very reasonable.

3.2 ChatGPT

Ask the same question to ChatGPT: "It takes 10 minutes to steam 1 bun, how long does it take to steam 10 buns?" The answer is a bit simpler.

3.3 GPT4AII

In the previous blog post, we introduced GPT4ALL, which can only realize English dialogue and ask related questions in English, and found that the effect is not as good as ChatGLM and ChatGPT. 

Four. Summary

ChatGLM is easy to deploy, and has a good understanding of Chinese. Its advantage is that it does not need to be connected to the Internet after deployment, and does not require account login. It is very safe. Its disadvantage is that it cannot incrementally learn the latest information on the Internet, and the expansion of the knowledge base requires additional Training samples.

Guess you like

Origin blog.csdn.net/weixin_43734080/article/details/130016026