ChatGLM2-6B, ChatGLM-6B model training on your own data set in practice

Based on the following two bloggers’ articles:

ChatGLM2-6B, ChatGLM-6B model introduction and practical training of your own data set_dream_home8407's blog-CSDN blog

Detailed tutorial on ChatGLM-6B model deployment and ptuning fine-tuning_sawyes' blog-CSDN blog

introduce

It is often seen that during discussions, the distinction between GLM-130B, ChatGLM model, and ChatGLM-6B is not clear enough. Here is an explanation:

  • GLM-130B : Open sourced by Tsinghua Wisdom AI in August 2022. This large language model is based on the previously proposed GLM (General Language Model), and has been adjusted in terms of Norm processing, activation function, Mask mechanism, etc. The purpose is to train an open source and open high-precision Chinese and English bilingual dense model, which can make more Many developers use hundreds of billions of models.
  • ChatGLM Billion Model : Application for internal testing will be launched in March 2023, and public application is currently suspended. This model is to solve the shortcomings of the large base model in complex problems, dynamic knowledge, and human alignment scenarios. Based on GLM-130B, it introduces dialogue-oriented user feedback and fine-tunes the instructions to obtain a conversational robot.
  • ChatGLM-6B : Open source in March 2023. While conducting internal testing of the ChatGLM 100 billion model, the Tsinghua team also released a small parameter version of the same technology to facilitate developers to learn and develop (non-commercial use). The latest version is ChatGLM2-6B

ChatGLM-6B is an open source text-generated dialogue model based on the General Language Model (GLM) framework. It has 6.2 billion parameters. Combined with model distillation technology, it is actually measured that the upper (INT4) video memory occupied by 2080ti graphics card training is about 6G.

advantage:

1. Lower deployment threshold: At FP16 half-precision, ChatGLM-6B requires at least 13GB of video memory for inference. Combined with model quantification technology, the requirement can be further reduced to 10GB (INT8) and 6GB (INT4), making ChatGLM-6B able to Deployed on consumer-grade graphics cards.
2. Longer sequence length: Compared with GLM-10B (sequence length 1024), ChatGLM2-6B has a sequence length of 32K, supporting longer conversations and applications.
3. Human intention alignment training: Supervised Fine-Tuning, Feedback Bootstrap, Reinforcement Learning from Human Feedback and other methods are used to enable the model to initially acquire the ability to understand the intention of human instructions. The output format is markdown for easy display. At present, the supervision fine-tuning method has been open sourced.

insufficient:

1. The model capacity is small: The small capacity of 6B determines its relatively weak model memory and language ability. As the number and rounds of its own training data increase, it will gradually lose its original dialogue ability. Zhipu AI Yu Kuifei The best training data given by the doctor is about 1,000 pieces.

2. Weak multi-turn dialogue ability: ChatGLM-6B's context understanding ability is not sufficient. When faced with long answer generation and multi-turn dialogue scenarios, context loss and understanding errors may occur. Solution: In the form of a plug-in knowledge base, such as ChatGLM-6B combined with langchain to implement local knowledge base link

3. After training your own data, you will forget the ability to have previous conversations, resulting in catastrophic forgetting. The solution is to add a general open source conversation fine-tuning data set to your own professional field data for training together.

The difference between ChatGLM2-6B and ChatGLM-6B:

ChatGLM2-6B is the second-generation version of the open source Chinese-English bilingual conversation model ChatGLM-6B. While retaining many excellent features of the first-generation model such as smooth conversation and low deployment threshold, ChatGLM2-6B introduces the following new features:
Updated Powerful performance: Based on the development experience of the first-generation ChatGLM model, we have comprehensively upgraded the base model of ChatGLM2-6B. ChatGLM2-6B uses the hybrid objective function of GLM, and has undergone pre-training of 1.4T Chinese and English identifiers and human preference alignment training. The evaluation results show that compared with the first-generation model, ChatGLM2-6B has better performance in MMLU (+23%), CEval The performance on data sets such as (+33%), GSM8K (+571%), and BBH (+60%) has been greatly improved, and it is highly competitive among open source models of the same size.

Longer context: Based on FlashAttention technology, we extended the context length (Context Length) of the base model from 2K in ChatGLM-6B to 32K, and used 8K context length training in the dialogue stage to allow more rounds of dialogue. However, the current version of ChatGLM2-6B has limited ability to understand single-round ultra-long documents. We will focus on optimization in subsequent iterative upgrades.
More efficient inference: Based on Multi-Query Attention technology, ChatGLM2-6B has more efficient inference speed and lower graphics memory usage: under the official model implementation, the inference speed is increased by 42% compared to the first generation, and under INT4 quantification, 6G The conversation length supported by video memory has been increased from 1K to 8K.
A more open protocol: ChatGLM2-6B weights are fully open to academic research and also allow commercial use with official written permission.

Fine-tuning:

1.1, ChatGLM2-6B official open source training method is based on P-Tuning v2 fine-tuning:
Link:  git_link
Based on QLoRA
Link:  git_link

1.2, ChatGLM-6B is fine-tuned based on P-Tuning v2:
link:  git_link

data set:

Sample data download link
Link:  Dataset
converts its own data set into the following format:

{
    "content": "类型#上衣*版型#宽松*版型#显瘦*图案#线条*衣样式#衬衫*衣袖型#泡泡袖*衣款式#抽绳",
    "summary": "这件衬衫的款式非常的宽松,利落的线条可以很好的隐藏身材上的小缺点,穿在身上有着很好的显瘦效果。领口装饰了一个可爱的抽绳,漂亮的绳结展现出了十足的个性,配合时尚的泡泡袖型,尽显女性甜美可爱的气息。"
}

If you need to use multi-turn conversation data to fine-tune the model, you can provide chat history. For example, the following is the training data of a three-turn conversation:

{"prompt": "长城h3风扇不转。继电器好的。保险丝好的传感器新的风扇也新的这是为什么。就是继电器缺一个信号线", "response": "用电脑能读数据流吗?水温多少", "history": []}
{"prompt": "95", "response": "上下水管温差怎么样啊?空气是不是都排干净了呢?", "history": [["长城h3风扇不转。继电器好的。保险丝好的传感器新的风扇也新的这是为什么。就是继电器缺一个信号线", "用电脑能读数据流吗?水温多少"]]}
{"prompt": "是的。上下水管都好的", "response": "那就要检查线路了,一般风扇继电器是由电脑控制吸合的,如果电路存在断路,或者电脑坏了的话会出现继电器不吸合的情况!", "history": [["长城h3风扇不转。继电器好的。保险丝好的传感器新的风扇也新的这是为什么。就是继电器缺一个信号线", "用电脑能读数据流吗?水温多少"], ["95", "上下水管温差怎么样啊?空气是不是都排干净了呢?"]]}

During training, you need to specify --history_column as the key of the chat history in the data (history in this example), and the chat history will be automatically spliced. Note that content exceeding the input length max_source_length will be truncated.

You can refer to the following instructions:

bash train_chat.sh

Encounter problems:

1. How to run the .sh script, install git, open Git Bash, and run.

2. Fine-tune and set up the venv environment separately to prevent version conflicts:

$ python -m venv venv
$ source venv/Scripts/activate


3. Problems encountered when running web_demo.py:

(1) The "THUDM/chatglm-6b" path cannot be read, replace it with "THUDM\\chatglm-6b"

  (2) There is no response when running bash train.sh. Modify "CUDA_VISIBLE_DEVICES=0 python3 main.py \" to "CUDA_VISIBLE_DEVICES=0 python main.py \"

(3) Under the PyCharm terminal, cuda failed to load. I didn’t study it in detail. Just switch to the Git Bash terminal.

(4) Test script

import os
import torch
from transformers import AutoConfig, AutoModel, AutoTokenizer


MODEL_PATH = "THUDM\\chatglm-6b"
CHECKPOINT_PATH = "ptuning\\output\\adgen-chatglm-6b-pt-128-2e-2\\checkpoint-3000"

# 载入Tokenizer
tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH, trust_remote_code=True)

config = AutoConfig.from_pretrained(MODEL_PATH, trust_remote_code=True, pre_seq_len=128)
model = AutoModel.from_pretrained(MODEL_PATH, config=config, trust_remote_code=True).cuda()

prefix_state_dict = torch.load(os.path.join(CHECKPOINT_PATH, "pytorch_model.bin"))
new_prefix_state_dict = {}

for k, v in prefix_state_dict.items():
    if k.startswith("transformer.prefix_encoder."):
        new_prefix_state_dict[k[len("transformer.prefix_encoder."):]] = v
model.transformer.prefix_encoder.load_state_dict(new_prefix_state_dict)

print(f"Quantized to 4 bit")
model = model.quantize(4)
model = model.half().cuda()
model.transformer.prefix_encoder.float()
model = model.eval()


print("用户:你好\n")
response, history = model.chat(tokenizer, "你好", history=[])
print("ChatGLM-6B:\n",response)
print("\n------------------------------------------------\n用户:")

line = input()
while line:
    response, history = model.chat(tokenizer, line, history=history)
    print("ChatGLM-6B:\n", response)
    print("\n------------------------------------------------\n用户:")
    line = input()

Prompt example:

# 调用函数示例
# 要求
Q = "对方不满意你的付出,你为此而感到很难过,是吗?"
# 问题模板
Q_motif = f"你是一个心理咨询师,熟悉心理咨询需要遵循的隐私和伦理规范,请根据我提供的心理咨询师的回复,推荐3-5个语义相同但是表达近似的替代回复。" \
      "心理咨询师的回复:{Q}。" \
      "请务必保证自己推荐的内容满足条件。需要满足限定条件中的每个细节。"
print("Q:"+Q_motif)
# 获取结果
result=glm_single_QA(model,tokenizer,Q_motif,2048,2048)
print("A:"+result)


 

Guess you like

Origin blog.csdn.net/chaishen10000/article/details/131632484