AI Intelligent Dialogue-Getting Started with ChatGLM2-6B Local Construction

Recap

What have you done this month? I have been doing AI painting for about 2 weeks, SD architectural drawing, training LORA, and I have basically gotten started with model control. I can generate what I want as expected. The sense of control is quite happy, otherwise if you input a sentence to generate an AI picture, it will be completely By God’s will, this is called invalid communication. I suddenly wanted to engage in AI song synthesis. In order to avoid copyright litigation, I synthesized some songs with my own voice. Because they were not recorded in a recording studio and there was relatively little sound data, so The training effect is still a bit unsatisfactory, but compared to my own singing, it is already amazing; after completing the practice of AI language, I started to work on AI artificial intelligence dialogue, which is the closest to brain thinking. It is a technology with a wide range of applications, so it must be implemented.

After practicing these gadgets, how do you play with them? In fact, there are two levels of industry application for AI. One is AI algorithms and applications, and the other is AI computing power scheduling. Because last year, I participated in Mobile Cloud's "East Number and West Number" "Architecture design and AI computing power scheduling should be fine (computing power resources were really wasted last year). Let's just take a look at the algorithm logic. Besides, I have exams in the second half of the year, so I can't waste too much time on it. I have spent my weekends in the past two months doing this, I can’t let it sink in like this.

Build deployment

environmental information

**
OS:Win11
GPU:3070-RTX 32G
PYTHON:3.10
**

Quantitative level Minimum GPU (conversation) Minimum GPU (fine-tuned)
FP16 (standard) 13GB 14GB
INT8 8GB 9GB
INT4 6GB 7GB
Install dependencies

Project download

# 下载
git clone https://github.com/THUDM/ChatGLM2-6B

# 安装相关依赖
cd ChatGLM2-6B
pip install -r requirements.txt -i https://pypi.douban.com/simple

Model download

Cloud disk link: https://pan.baidu.com/s/1AIerQMpvw7yO34Gq9BFxAQ
extraction code: 5uzo

Place the downloaded model in the THUDM folder:

Graphics card driver

# 查看本机显卡信息
nvidia-smi

Install cuda-toolkit tool: https://developer.nvidia.com/cuda-toolkit-archive

Note: Choose a version no higher than the above-mentioned CUDA, the recommended version is 11.8;

**Pytorch dependencies**

# 下载对应版本的 Pytorch
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

**CUDA Check**

# cuda 可用校验(返回 True 则表示可用)
python -c "import torch; print(torch.cuda.is_available());"
Configuration options

** Precision selection **
In api.py cli_demo.py web_demo.py web_demo.py and other scripts, the command is as follows:

# 模型默认以 FP16 精度加载,运行模型需要大概 13GB 显存
model = AutoModel.from_pretrained("THUDM/chatglm2-6b", trust_remote_code=True).cuda()

# 如果 GPU 显存有限,按需修改,目前只支持 4/8 bit 量化
model = AutoModel.from_pretrained("THUDM/chatglm2-6b", trust_remote_code=True).quantize(4).cuda()

# 如果内存不足,可以直接加载量化后的模型
model = AutoModel.from_pretrained("THUDM/chatglm2-6b-int4",trust_remote_code=True).cuda()

# 如果没有 GPU 硬件的话,也可以在 CPU 上进行对话,但是对话速度会很慢,需要32GB内存(量化模型需要5GB内存)
model = AutoModel.from_pretrained("THUDM/chatglm2-6b", trust_remote_code=True).float()

# 如果内存不足,可以直接加载量化后的模型
model = AutoModel.from_pretrained("THUDM/chatglm2-6b-int4",trust_remote_code=True).float()

To run the quantized model on the CPU, you also need to install gcc and openmp. Most Linux distributions have it installed by default. For Windows, openmp can be checked when installing TDM-GCC.

Start running

**Console**

# 控制台运行
python cli_demo.py

**Web version**

# 方式一
python web_demo.py

# 方式二
# 安装 streamlit_chat 模块
pip install streamlit_chat -i https://pypi.douban.com/simple
streamlit run web_demo2.py

Note: Due to the slow network access of Gradio in China, when demo.queue().launch(share=True, inbrowser=True) is enabled, all networks will be forwarded through the Gradio server, resulting in a significant decline in the typewriter experience. Now the default startup method has been changed to share=False, if you need public network access, you can change it to share=True to start.

** API **

# 安装 fastapi uvicorn 模块
pip install fastapi uvicorn -i https://pypi.douban.com/simple

# 运行 api
python api.py

By default, it is deployed on the local port 8000 and is called through the POST method.

curl -X POST "http://127.0.0.1:8000" \
     -H 'Content-Type: application/json' \
     -d '{"prompt": "你好", "history": []}'

Summarize

appendix

Reference: https://juejin.cn/post/7250348861238870053

Guess you like

Origin blog.csdn.net/weixin_36532747/article/details/131743255