[Grandma will see it] Cloud server deploys open source ChatGLM-6B, so that you can also have your own ChatGPT

1. Background

Hello everyone, I am Xiaojuan.

Recently, ChatGPT not only released GPT-4, but also can be connected to the Internet after unsealing. I have to admire the speed of AI update iterations so fast that I can't keep up with the rhythm. But everyone has also noticed that with each update of ChatGPT, OpenAI has more and more restrictions on its open use. In the past, the domestic network accessed GPT3 casually, but now it is banned at every turn

Therefore, today I will teach you how to deploy ChatGLM-6B, which is open sourced by Tsinghua University in China. A brief introduction, ChatGLM is a dialogue language model, optimized for Chinese question and answer and dialogue. The current training model has 6.2 billion parameters, and a large model with 130 billion parameters will be launched in the future. It is expected that the domestic ChatGLM will become more and more powerful.

ChatGLM's open source address: THUDM/ChatGLM-6B

Not much nonsense, let’s go directly to the effect, the following is the result of the ChatGLM Chinese dialogue (not ChatGPT)

(PS: At the end of the article, we have prepared ChatGLM’s free experience address and computing power platform free experience method for everyone, be sure to read the end of the article)

insert image description here

2. Preparations

According to the official statement, ChatGLM requires at least 13G of video memory for hardware configuration

Things to prepare are as follows:

A GPU cloud server (16GB video memory, 32G memory)
The graphics card driver cuda and pytorch framework have been installed on the cloud server (the platform has ready-made images, just install it directly)

Let’s talk about the choice of server manufacturers. GPU servers are more expensive, so Xiaojuan compared the GPU specifications of some large and small manufacturers. Here we only look at the configurations that meet the requirements and the price is right.

manufacturers	configuration	Price	Advantage
Ali Cloud	4 cores-15G memory-16 memory NVIDIA T4	1878/month	Big factory service, but the price is too expensive
Tencent Cloud	10-core-40G-NVIDIA T4	8.68/hour	Served by major manufacturers, but the price of exclusive 1 GPU is slightly higher
Huawei Cloud	8-core-32G-16 memory NVIDIA T4	3542/month	too expensive
mistGPU	8-core-32G-24G video memory NVIDIA 3090	4.5/hour	Cons: Only 1GB of free storage
Lan Rui Xingzhou	10-core-40G-24G memory NVIDIA 3090	1.9/hour	Recommended, high configuration and low price, now NVIDIA 3090 has a special price

We use the servers of the computing power platform Lanrui Xingzhou here, and the price is the advantage. It should be noted that the GPU server should be billed according to the amount of usage, that is, when you use it, you will be billed according to the duration of use, and you will not be billed when you turn it off when you are not in use.

3. Server Configuration

This step is relatively simple to purchase a server and install the environment

3.1 Registration and use

Open the Lanrui Xingzhou official website registration address: https://www.lanrui-ai.com/register?invitation_code=4104,

When registering an account, fill in the invitation code 4104, so that the platform will give you a free recharge

We can try the server for free. You can also recharge your account in the upper right corner

insert image description here

3.2 Purchase a server and install a mirror image

The server configuration required for purchase on the website 算力市场, here I choose 3090-24Gthis one, click 使用the button to enter the mirror installation interface

insert image description here

Select the running environment mirror 公共镜像-> pytorchjust use the latest one, then select the pre-trained model in the advanced settings chatglm-6b(this will pre-load the chatGLM model to the server, no need to download it manually) and create an instance (make sure you have enough balance in your account)

insert image description here

Wait for about 5 minutes, the workspace is created, click 进入-> JupyterLabto enter the server, and then prepare for the installation of ChatGLM

insert image description here

4. Deploy ChatGLM

4.1 Git acceleration configuration

In order to avoid git clone being too slow, set git academic resource acceleration on the command line in advance

# 执行下面2条命令，设置git学术资源加速
git config --global http.proxy socks5h://172.16.16.39:8443
git config --global https.proxy socks5h://172.16.16.39:8443

Executing git clonethe command in the following steps will not get stuck.

It is also simple to cancel git academic acceleration, execute the following command (cancel after all steps are executed~)

# 取消git学术资源加速
git config --global --unset https.proxy
git config --global --unset http.proxy

4.2 Download ChatGLM source code

After entering the Jupyter page, you can see two directories, and explain the directories:

data directory, storing data, shared by the platform
The imported_models directory stores the pre-trained model, which is the model you choose when creating the workspace

Click dataon the directory, you can see ChatGLM-6Bthe folder, which contains the source code of ChatGLM.

If there is no ChatGLM-6Bdirectory, then we need to download the code in this step, the operation is as follows:

The page opens a Terminal terminal, and executes the command in the Terminal terminal

git clone https://github.com/THUDM/ChatGLM-6B.git

insert image description here

4.3 Installation dependencies

1. Execute the command to switch to ChatGLM-6Bthe directory

cd ChatGLM-6B

2. Then modify requirements.txtthe file and add all the required dependencies. The following configuration can be added at the end of the file. If these 3 dependencies have been added to the file, there is no need to modify it.

chardet
streamlit
streamlit-chat

3. Save the file after adding, as shown in the figure

insert image description here

4. Then execute the following command on the command line to install dependencies

# 使用默认镜像源下载会超时，这里用了清华的pip镜像源地址
pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple/

insert image description here

This step may execute an error

ERROR: Could not install packages due to an OSError: Missing dependencies for SOCKS support.

insert image description here

Solution: switch to the root user and then execute the command

# 切换root用户
sudo su

# 重新执行
pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple/

4.4 Startup script modification

Because the model is stored in a separate folder, the code for reading the model file in the startup script needs to be modified
In order to access our ChatGLM from the public network, we need to modify the listening address to 0.0.0.0, and the port to 27777. This is the debugging address of the Lanrui Xingzhou platform

Modification steps:

1. Modify web_demo2.pythe model path in the file and replace it with the absolute path of the model. The modification method is as follows:

Code before modification

    tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True)
    model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).half().cuda()

modified code

    tokenizer = AutoTokenizer.from_pretrained("/home/user/imported_models/chatglm-6b", trust_remote_code=True)
    model = AutoModel.from_pretrained("/home/user/imported_models/chatglm-6b", trust_remote_code=True).half().cuda()

ctrl + sSave it after modification

4.5 Start ChatGLM

In ChatGLM-6Bthe directory, the command line executes

python3 -m streamlit run ./web_demo2.py --server.port 27777 --server.address 0.0.0.0

Start the webui interface of ChatGLM

insert image description here

If you see http://0.0.0.0:27777the words, it means that it has started successfully.

5. use

We need to access the newly deployed service from the browser and return to the Lanrui Xingzhou platform

Click the copy debugging link on the workspace page 自定义服务, and then open the copied link on the browser

insert image description here

Then you can start a conversation on this page

Note that during the first dialogue, the program will load the model file, which will take a long time. You can check the loading progress from the command line just started .

Wait for the first loading to complete, and then talk later, the response will be very fast

6. Dialogue effect

At this point, all the installation and deployment processes have been successfully completed. Let's take a look at the effect. The copied link can also be opened on the mobile phone. The following is the effect on the mobile phone.

7. Close the service and restart the service

Because our service is charged according to usage, you 停止运行can click on the page when not in use

insert image description here

Then when you want to rerun the service, click 启动the button on the workspace page. After the workspace is recreated, enter Jupyter, start again from the command line

# 进入ChatGLM-6B目录
cd data/ChatGLM-6B/
 # 没挂系统盘时，要重新安装依赖
 pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple/
# 启动服务
python3 -m streamlit run ./web_demo2.py --server.port 27777 --server.address 0.0.0.0

insert image description here

8. Try ChatGLM for free

Xiaojuan has prepared 2 ways for you to experience for free:

(1) Register a platform account through the link below, write the invitation code 4104, the platform will recharge some money for free, and then you can deploy the experience yourself.

https://www.lanrui-ai.com/register?invitation_code=4104

(2) Xiaojuan has prepared his own ChatGLM experience address for everyone, which can be used for a few days. Obtaining method: Obtaining 卷福同学keywords in the official accountChatGLM