Practical deployment of Tsinghua open source language large model ChatGLM3
ChatGLM3 is a new generation of dialogue pre-training model jointly released by Zhipu AI and Tsinghua University KEG Laboratory.
Project library address: https://github.com/THUDM/ChatGLM3
Installation Environment
It is recommended to use a virtual environment
git clone https://github.com/THUDM/ChatGLM3
cd ChatGLM3
pip install -r requirements.txt
The recommended version of the transformers library is 4.30.2, and the recommended version of torch is version 2.0 and above to obtain the best inference performance.
Download model file
git lfs install
git clone https://www.modelscope.cn/ZhipuAI/chatglm3-6b.git
Need to wait a long time
Test whether the installation is successful
During inference, change THUDM/chatglm3-6b to the path to download the model yourself.
gpu inference
Inference requires more than 13g of video memory
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True, device='cuda')
model = model.eval()
response, history = model.chat(tokenizer, "你好", history=[])
print(response)
response, history = model.chat(tokenizer, "晚上睡不着应该怎么办", history=history)
print(response)
cpu inference
Inference requires more than 32g of memory
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True,.float()
)
model = model.eval()
response, history = model.chat(tokenizer, "你好", history=[])
print(response)
response, history = model.chat(tokenizer, "晚上睡不着应该怎么办", history=history)
print(response)
Quantitative reasoning
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm3-6b",trust_remote_code=True).quantize(4).cuda()
model = model.eval()
response, history = model.chat(tokenizer, "你好", history=[])
print(response)
response, history = model.chat(tokenizer, "晚上睡不着应该怎么办", history=history)
print(response)
If you encounter any problems, please feel free to communicate in the comment area
Exchange more technologies in the group
130856474