Take you hand in hand to deploy the latest version of Tsinghua University's large model chaglm2-6b in linux

Preparation:

# 下载项目源代码
git clone https://github.com/THUDM/ChatGLM2-6B
# 切换到项目根目录
cd ChatGLM2-6B
# 安装依赖
pip install -r requirements.txt
# 安装web依赖
pip install gradio

If there is a problem with the installation, you can try to manually install torch
1 # 1 The first step is to install the virtual environment and activate the environment

conda create -n ChatGLM2 python=3.10.6
conda activate ChatGLM2 

2nvidia-smi view cuda version, 12.0

insert image description here
3 Install torch The first way is
pytorch official website: https://pytorch.org/
insert image description here
insert image description here
insert image description here
insert image description here
insert image description here

insert image description here
insert image description here

After many attempts, I found that the conda installation is really fast and delicious. In the past few attempts to install torch with pip, there will be a timeout error. This has something to do with the server and depends on the situation.
insert image description here

# 安装依赖
pip install -r requirements.txt

insert image description here
Second, preparations. While installing the dependency package, you can also manually download the model package first:
insert image description here

1. Project file preparation
The entire project needs to be cloned from two remote warehouses, one is the source code on github, and the other is the model on HuggingFace.

For the source code, since the total volume is small, you can directly download the zip package from the webpage and decompress it, or use the git command (the computer needs to have git installed in advance) to clone it into a local folder:

git clone https://github.com/THUDM/ChatGLM2-6B

For the model, since there are 7 checkpoint files with large size , if the direct clone may take too long or the network connection is not stable enough, you can download the large and small files separately. Among them, large files can be downloaded manually from Tsinghua Cloud, and small files are mainly model implementation files, which are small in number and small in size (total 11 files including tokenizer.model), and can be downloaded in two ways, one is on the HuggingFace page Manually download one by one, the other is to use the GIT_LFS_SKIP_SMUDGE parameter to skip large files and clone the entire project at one time (the computer needs to have Git LFS installed in advance) :

GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/THUDM/chatglm2-6b

Finally, copy and replace the large file downloaded manually earlier to form a complete model.

Pitfalls to watch out for:

1)该命令在 powershell 中会报错,提示不认识 GIT_LFS_SKIP_SMUDGE 命令,放在 git bash 终端中则可顺利执行;

2)被该命令跳过的“大文件”不仅有 7 个 .bin 文件,还包括 1.02MB 的 tokenizer.model。

The model file can be stored in any location in theory. Refer to the official demo video to parallel it to the source code folder, that is, the directory structure of the entire project file is as follows:

insert image description here

According to the directory structure, modify the web_demo.py file in the source code directory, and replace the two THUDM/chatglm2-6b with model:

tokenizer = AutoTokenizer.from_pretrained("model", trust_remote_code=True)
model = AutoModel.from_pretrained("model", trust_remote_code=True).cuda()

insert image description here

If you want to use the command line to interact with the model, you need to modify cli_demo.py. The same is true for web_demo2.py.

3. Run the model
CD to the web_demo.py folder at the folder level, activate chatglm2-6b at the environment level, and then execute:

python web_demo.py

insert image description here
After running, this page will pop up:
insert image description here

Pitfalls to watch out for:

1) VPN cannot be enabled when using web_demo, otherwise the Expecting value: line 1 column 1 (char 0) error will pop up;

2) If you use web_demo2, you need to install streamlit and streamlit-chat additionally according to the official website prompts, and the startup command is

streamlit run web_demo2.py 。

4. Experience
The reasoning speed has been significantly improved compared with the previous generation, but the perception of performance improvement is not strong. Looking forward to the team launching larger models such as 13B, 30B, 65B in the future.

Guess you like

Origin blog.csdn.net/stay_foolish12/article/details/131437090