[Large model] ChatGLM2-6B-32K with a length of 32K context is here, open source and commercially available

Introduction
- ChatGLM2-6B-32K
- ChatGLM2-6B-32K new features
Environment configuration
- Environment build
- install dependencies
Code and model weight pull
- Pull ChatGLM2-6B
- Pull chatglm2-6b-32k model weight and code
terminal test
web testing
- install gradio
- Load the model and start the service
reference

Introduction

ChatGLM2-6B-32K

Based on ChatGLM2-6B, ChatGLM2-6B-32K further strengthens the ability to understand long texts, and can better handle contexts up to 32K in length.

The positional encoding is updated based on the method of positional interpolation and trained with a context length of 32K in the dialogue phase.

In actual use, if the context length is basically within 8K, we recommend using the ChatGLM2-6B address ; if you need to handle the context length exceeding 8K, we recommend using the ChatGLM2-6B-32K.

ChatGLM2-6B-32K new features

ChatGLM2-6B-32K is an extended version of the open source Chinese-English bilingual dialogue model ChatGLM2-6B:

Training data: ChatGLM2-6B-32K uses the hybrid objective function of GLM, and has undergone pre-training of 1.4T Chinese and English identifiers and human preference alignment training.
Longer context: Based on FlashAttention technology, we extended the Context Length of the pedestal model from 2K of ChatGLM-6B to 32K, and used 32K context length training in the dialogue phase to allow more rounds of dialogue.
More efficient reasoning: Based on Multi-Query Attention technology, ChatGLM2-6B-32K has more efficient reasoning speed and lower video memory usage: under the official model implementation, the reasoning speed is 42% higher than that of the first generation, and under INT4 quantization , The dialogue length supported by 6G video memory has been increased from 1K to 8K.
More open protocol: ChatGLM2-6B-32K weights are completely open to academic research, and free commercial use is also allowed after filling out the questionnaire for registration .

Environment configuration

Environment build

The specific construction process reference:
[AI actual combat] the strongest open source 6B Chinese language model ChatGLM2-6B, built from scratch
- ChatGLM2-6B build

install dependencies

pip install protobuf transformers==4.30.2 cpm_kernels torch>=2.0 gradio mdtex2html sentencepiece accelerate

Code and model weight pull

Pull ChatGLM2-6B

git clone https://github.com/THUDM/ChatGLM2-6B

The github website occasionally has a convulsion, so you need to wait patiently. If it fails, pull it again

Pull chatglm2-6b-32k model weight and code

cd ChatGLM2-6B
git clone https://huggingface.co/THUDM/chatglm2-6b-32k

Since the weight file is very large, if it fails, execute rm -rf chatglm2-6b-32k and pull it again.

View file list:

ls -l chatglm2-6b-32k

output:

.gitattributes  1.52 kB
MODEL_LICENSE   4.13 kB
README.md   9.04 kB
config.json 1.25 kB
configuration_chatglm.py    2.31 kB
modeling_chatglm.py 51.5 kB
pytorch_model-00001-of-00007.bin    1.83 GB
pytorch_model-00002-of-00007.bin    1.97 GB
pytorch_model-00003-of-00007.bin    1.93 GB
pytorch_model-00004-of-00007.bin    1.82 GB
pytorch_model-00005-of-00007.bin    1.97 GB
pytorch_model-00006-of-00007.bin    1.93 GB
pytorch_model-00007-of-00007.bin    1.05 GB
pytorch_model.bin.index.json    20.4 kB
quantization.py 14.7 kB
tokenization_chatglm.py 10.1 kB
tokenizer.model 1.02 MB
tokenizer_config.json

【】If the large file download fails, download the model file according to the following method;

wget https://huggingface.co/THUDM/chatglm2-6b-32k/resolve/main/pytorch_model-00001-of-00007.bin
wget https://huggingface.co/THUDM/chatglm2-6b-32k/resolve/main/pytorch_model-00002-of-00007.bin
其他的同上...

terminal test

Enter the python environment:

python3

Enter code:

from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("./chatglm2-6b-32k", trust_remote_code=True)
model = AutoModel.from_pretrained("./chatglm2-6b-32k", trust_remote_code=True).half().cuda()
model = model.eval()
response, history = model.chat(tokenizer, "你好", history=[])
print(response)
response, history = model.chat(tokenizer, "晚上睡不着应该怎么办", history=history)
print(response)

output:

晚上睡不着可能会让你感到焦虑或不舒服,但以下是一些可以帮助你入睡的方法:

1. 制定规律的睡眠时间表:保持规律的睡眠时间表可以帮助你建立健康的睡眠习惯,使你更容易入睡。尽量在每天的相同时间上床,并在同一时间起床。
2. 创造一个舒适的睡眠环境:确保睡眠环境舒适,安静,黑暗且温度适宜。可以使用舒适的床上用品,并保持房间通风。
3. 放松身心:在睡前做些放松的活动,例如泡个热水澡,听些轻柔的音乐,阅读一些有趣的书籍等,有助于缓解紧张和焦虑,使你更容易入睡。
4. 避免饮用含有咖啡因的饮料:咖啡因是一种刺激性物质,会影响你的睡眠质量。尽量避免在睡前饮用含有咖啡因的饮料,例如咖啡,茶和可乐。
5. 避免在床上做与睡眠无关的事情:在床上做些与睡眠无关的事情,例如看电影,玩游戏或工作等,可能会干扰你的睡眠。
6. 尝试呼吸技巧:深呼吸是一种放松技巧,可以帮助你缓解紧张和焦虑,使你更容易入睡。试着慢慢吸气,保持几秒钟,然后缓慢呼气。

如果这些方法无法帮助你入睡,你可以考虑咨询医生或睡眠专家,寻求进一步的建议。

web testing

Use gradio to build pages

install gradio

pip install gradio   -i https://pypi.tuna.tsinghua.edu.cn/simple

Load the model and start the service

Modify the model path;

vi web_demo.py

to line 6,7:

tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm2-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm2-6b", trust_remote_code=True).cuda()

change into:

tokenizer = AutoTokenizer.from_pretrained("./chatglm2-6b-32k", trust_remote_code=True)
model = AutoModel.from_pretrained("./chatglm2-6b-32k", trust_remote_code=True).cuda()

Start the service:

python web_demo.py

Test address:

http://10.192.x.x:7860/

[Large model] ChatGLM2-6B-32K with a length of 32K context is here, open source and commercially available

[Large model] ChatGLM2-6B-32K with a length of 32K context is here, open source and commercially available

Introduction

ChatGLM2-6B-32K

ChatGLM2-6B-32K new features

Environment configuration

Environment build

install dependencies

Code and model weight pull

Pull ChatGLM2-6B

Pull chatglm2-6b-32k model weight and code

terminal test

web testing

install gradio

Load the model and start the service

reference

Guess you like