AIGC large model ChatGLM2-6B: local deployment and experience of domestic version chatgpt

1 Introduction to ChatGLM2-6B

ChatGLM is a Chinese-English bilingual dialogue robot developed by Zhipu AI, a company that transforms Tsinghua's technological achievements. ChatGLM is trained based on the GLM130B 100 billion basic model. It has multi-domain knowledge, coding ability, common sense reasoning and application ability; supports interaction with users through natural language dialogue, and handles various natural language tasks. For example: dialogue and chat, intelligent question and answer, creation of articles, creation of scripts, event extraction, code generation, etc.

ChatGLM2-6B Upgrade Highlights

The second-generation version of ChatGLM-6B, on the basis of retaining many excellent features of the first-generation model, such as smooth dialogue and low deployment threshold, has added many new features:

(1) More powerful performance

Based on the development experience of the first generation model of ChatGLM, the base model of ChatGLM2-6B has been fully upgraded. ChatGLM2-6B uses the mixed objective function of GLM, and has undergone pre-training of 1.4T Chinese and English identifiers and human preference alignment training. The evaluation results show that, compared with the original model, the performance of ChatGLM2-6B on data sets such as MMLU (+23%), CEval (+33%), GSM8K (+571%), and BBH (+60%) has achieved great results. The increase in magnitude has strong competitiveness in open source models of the same size.

(2) Longer context

Based on FlashAttention technology, the researchers extended the context length of the pedestal model from 2K of ChatGLM-6B to 32K, and used 8K context length training in the dialogue stage, allowing more rounds of dialogue. However, the current version of ChatGLM2-6B has limited ability to understand single-round ultra-long documents, and will focus on optimization in subsequent iterative upgrades.

(3) More efficient reasoning

Based on Multi-Query Attention technology, ChatGLM2-6B has more efficient reasoning speed and lower memory usage. Under the implementation of the official model, the reasoning speed has increased by 42% compared with the first generation. Under INT4 quantization, the dialogue length supported by 6G video memory has been increased from 1K to 8K.

(4) More open protocol

ChatGLM2-6B weights are completely open to academic research, and commercial use is also allowed after obtaining official written permission.

Compared with the original model, ChatGLM2-6B has achieved huge improvements in multiple dimensions such as mathematical logic, knowledge reasoning, and long document understanding.

2 ChatGLM2-6B local deployment

2.1 conda environment preparation

For conda environment preparation, see: annoconda

2.2 Operating environment installation

conda create -n chatglm python=3.9
conda activate chatglm 

git clone https://github.com/THUDM/ChatGLM2-6B
cd ChatGLM2-6B

pip install -r requirements.txt

mkdir THUDM
cd THUDM
git clone https://huggingface.co/THUDM/chatglm2-6b

After the above steps are completed, view the downloaded model, as shown below:

[root@localhost ChatGLM2-6B]# ll THUDM/chatglm2-6b/
总用量 12195716
-rw-r--r-- 1 root root       1263 8月   2 10:42 config.json
-rw-r--r-- 1 root root       2304 8月   2 10:42 configuration_chatglm.py
-rw-r--r-- 1 root root      51910 8月   2 10:42 modeling_chatglm.py
-rw-r--r-- 1 root root       4198 8月   2 10:42 MODEL_LICENSE
-rw-r--r-- 1 root root 1827780615 8月   2 10:45 pytorch_model-00001-of-00007.bin
-rw-r--r-- 1 root root 1968299005 8月   2 10:48 pytorch_model-00002-of-00007.bin
-rw-r--r-- 1 root root 1927414561 8月   2 10:51 pytorch_model-00003-of-00007.bin
-rw-r--r-- 1 root root 1815225523 8月   2 10:53 pytorch_model-00004-of-00007.bin
-rw-r--r-- 1 root root 1968299069 8月   2 10:56 pytorch_model-00005-of-00007.bin
-rw-r--r-- 1 root root 1927414561 8月   2 10:59 pytorch_model-00006-of-00007.bin
-rw-r--r-- 1 root root 1052808067 8月   2 11:01 pytorch_model-00007-of-00007.bin
-rw-r--r-- 1 root root      20645 8月   2 11:01 pytorch_model.bin.index.json
-rw-r--r-- 1 root root      14880 8月   2 11:01 quantization.py
-rw-r--r-- 1 root root       8175 8月   2 11:01 README.md
-rw-r--r-- 1 root root      10318 8月   2 11:01 tokenization_chatglm.py
-rw-r--r-- 1 root root        256 8月   2 11:01 tokenizer_config.json
-rw-r--r-- 1 root root    1018370 8月   2 11:01 tokenizer.model

2.3 Change code

[root@localhost ChatGLM2-6B]# vi web_demo.py 

The last line of code, changed to share=True, looks like this after the change:

demo.queue().launch(share=True, inbrowser=True)

 2.4 start web

[root@localhost ChatGLM2-6B]# python web_demo.py 

See the following screen, indicating that the startup is successful

Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:06<00:00,  1.13it/s]
/root/anaconda3/envs/chat/lib/python3.9/site-packages/gradio/components/textbox.py:259: UserWarning: The `style` method is deprecated. Please set these arguments in the constructor instead.
  warnings.warn(
Running on local URL:  http://127.0.0.1:7860
Running on public URL: https://ac0a819376990775ad.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)

LAN access by starting the returned address

https://ac0a819376990775ad.gradio.live

The startup interface is as follows:

3 ChatGLM2-6B use

3.1 Web Q&A

3.1.1 Knowledge quiz

 

 3.1.2 Text generation

 

3.1.3  Mathematical logic

3.1.4 Language Understanding

3.1.5 Common sense questions

 3.1.6 Code Generation

 3.1.7 Medical problems

3.1.8 Content Summary

 3.2 Call chatglm through code

The code is saved in the test.py file in the root directory

from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm2-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm2-6b", trust_remote_code=True, device='cuda')
model = model.eval()

question = '肿瘤居家营养应该注意什么?'
response, history = model.chat(tokenizer, question, history=[])
print(response)

After the code is executed, the printout is as follows:

[root@localhost ChatGLM2-6B]# python test.py 

Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:08<00:00,  1.26s/it]
肿瘤患者在居家护理过程中,饮食是非常重要的一环,以下是一些肿瘤居家营养应该注意的事项:

1. 均衡饮食:肿瘤患者需要摄入足够的蛋白质、维生素、矿物质和纤维素等营养物质,以支持身体恢复和预防并发症。建议患者多吃蔬菜、水果、全谷类、豆类、坚果和种子等富含营养的食物。

2. 控制热量和脂肪:肿瘤患者需要控制卡路里摄入量,以避免体重增加和影响治疗效果。建议减少高热量和高脂肪的食物,如油炸食品、甜点和高脂肪的肉类等。

3. 增加蛋白质的摄入:蛋白质是身体所需的重要营养物质,对于肿瘤患者来说,摄入足够的蛋白质可以帮助身体修复和恢复。建议患者增加蛋白质摄入,包括豆类、肉类、鱼类和蛋白质粉等。

4. 控制钠的摄入:肿瘤患者需要控制钠的摄入量,以避免血压升高和影响治疗效果。建议减少盐的摄入,包括海盐、盐和其他高钠食品等。

5. 避免刺激性食物:一些肿瘤患者可能会出现恶心、呕吐等症状,这些食物可能会加重症状。建议患者避免吃辛辣、油腻、咖啡、酒等刺激性食物。

6. 注意饮食卫生:肿瘤患者需要保持饮食卫生,以避免食物中毒和其他感染性疾病的发生。建议患者勤洗手、生熟分开、储存食物规范等。

肿瘤患者在居家护理过程中,饮食需要遵循医生或营养师的建议,以支持身体恢复和预防并发症。

Guess you like

Origin blog.csdn.net/lsb2002/article/details/132074878