Unlock the potential of ChatGLM-6B: optimize large language model training, break through task difficulties and answer parsing problems
LLM (Large Language Model) usually has a lot of prior knowledge, which makes it have good performance in many natural language processing tasks.
However, if you want to directly use LLM to complete some tasks, there will be some difficulties in answer parsing, such as standardizing the output format and strictly obeying the input information.
Therefore, under this project, we refer to the code of ChatGLM-Tuning and try to finetune the large model ChatGLM-6B so that it can better align the output format we need.
1. Environment installation
Since the environment required by ChatGLM is different from that of other experiments in this project, we strongly recommend that you create a new virtual environment to execute all the code in this directory.
Below, we will take Anaconda
as an example to show how to quickly build an environment:
- Create a virtual environment, you can
llm_env
modify to any environment name you want to create:
conda create -n llm_env python=3.8
- Activate the newly created virtual environment and install the corresponding dependency packages:
conda activate llm_env
pip install -r requirements.txt
- Install the corresponding version
peft
:
cd peft-chatglm
python setup.py install
2. Dataset preparation
In this experiment, we will try to feed the model with a mixed dataset of 信息抽取
+ 文本分类
tasks for finetune, and the dataset is in data/mixed_train_dataset.jsonl
.
Each piece of data is divided into context
and target
two parts:
-
context
Part is to accept user input. -
target
The section is used to specify the output of the model.
Including context
2 parts:
-
Instruction: It is used to inform the model of specific instructions. When a model is required to solve multiple tasks at the same time, different Instructions can be set to help the model determine what task it should do at the moment.
-
Input: The current user's input.
- Information Extraction Data Example
The Instruction part tells the model that it needs to do the "reading comprehension" task now, and the Input part tells the model the sentences to be extracted and the format of the output.
{
"context": "Instruction: 你现在是一个很厉害的阅读理解器,严格按照人类指令进行回答。\nInput: 找到句子中的三元组信息并输出成json给我:\n\n九玄珠是在纵横中文网连载的一部小说,作者是龙马。\nAnswer: ",
"target": "```json\n[{\"predicate\": \"连载网站\", \"object_type\": \"网站\", \"subject_type\": \"网络小说\", \"object\": \"纵横中文网\", \"subject\": \"九玄珠\"}, {\"predicate\": \"作者\", \"object_type\": \"人物\", \"subject_type\": \"图书作品\", \"object\": \"龙马\", \"subject\": \"九玄珠\"}]\n```"
}
- Text Classification Data Example
The Instruction part tells the model that it needs to do the "reading comprehension" task now, and the Input part tells the model the sentences to be extracted and the format of the output.
{
"context": "Instruction: 你现在是一个很厉害的阅读理解器,严格按照人类指令进行回答。\nInput: 下面句子可能是一条关于什么的评论,用列表形式回答:\n\n很不错,很新鲜,快递小哥服务很好,水果也挺甜挺脆的\nAnswer: ",
"target": "[\"水果\"]"
}
3. Model training
3.1 Single card training
Two fine-tuning methods, LoRA Finetune and P-Tuning, are supported in the experiment .
Run train.sh
the file, and adjust according to the video memory of your own GPU batch_size
, max_source_seq_len
, max_target_seq_len
parameters:
# LoRA Finetune
python train.py \
--train_path data/mixed_train_dataset.jsonl \
--dev_path data/mixed_dev_dataset.jsonl \
--use_lora True \
--lora_rank 8 \
--batch_size 1 \
--num_train_epochs 2 \
--save_freq 1000 \
--learning_rate 3e-5 \
--logging_steps 100 \
--max_source_seq_len 400 \
--max_target_seq_len 300 \
--save_dir checkpoints/finetune \
--img_log_dir "log/fintune_log" \
--img_log_name "ChatGLM Fine-Tune" \
--device cuda:0
# P-Tuning
python train.py \
--train_path data/mixed_train_dataset.jsonl \
--dev_path data/mixed_dev_dataset.jsonl \
--use_ptuning True \
--pre_seq_len 128 \
--batch_size 1 \
--num_train_epochs 2 \
--save_freq 200 \
--learning_rate 2e-4 \
--logging_steps 100 \
--max_source_seq_len 400 \
--max_target_seq_len 300 \
--save_dir checkpoints/ptuning \
--img_log_dir "log/fintune_log" \
--img_log_name "ChatGLM P-Tuning" \
--device cuda:0
After successfully running the program, you will see the following interface:
...
global step 900 ( 49.89% ) , epoch: 1, loss: 0.78065, speed: 1.25 step/s, ETA: 00:12:05
global step 1000 ( 55.43% ) , epoch: 2, loss: 0.71768, speed: 1.25 step/s, ETA: 00:10:44
Model has saved at checkpoints/model_1000.
Evaluation Loss: 0.17297
Min eval loss has been updated: 0.26805 --> 0.17297
Best model has saved at checkpoints/model_best.
global step 1100 ( 60.98% ) , epoch: 2, loss: 0.66633, speed: 1.24 step/s, ETA: 00:09:26
global step 1200 ( 66.52% ) , epoch: 2, loss: 0.62207, speed: 1.24 step/s, ETA: 00:08:06
...
log/finetune_log
Below you will see the graph of the training loss:
3.2 Doka training
Run train_multi_gpu.sh
the file, CUDA_VISIBLE_DEVICES
specify available graphics cards by specifying num_processes
the number of graphics cards to use:
# LoRA Finetune
CUDA_VISIBLE_DEVICES=0,1 accelerate launch --multi_gpu --mixed_precision=fp16 --num_processes=2 train_multi_gpu.py \
--train_path data/mixed_train_dataset.jsonl \
--dev_path data/mixed_dev_dataset.jsonl \
--use_lora True \
--lora_rank 8 \
--batch_size 1 \
--num_train_epochs 2 \
--save_freq 500 \
--learning_rate 3e-5 \
--logging_steps 100 \
--max_source_seq_len 400 \
--max_target_seq_len 300 \
--save_dir checkpoints_parrallel/finetune \
--img_log_dir "log/fintune_log" \
--img_log_name "ChatGLM Fine-Tune(parallel)"
# P-Tuning
CUDA_VISIBLE_DEVICES=0,1 accelerate launch --multi_gpu --mixed_precision=fp16 --num_processes=2 train_multi_gpu.py \
--train_path data/mixed_train_dataset.jsonl \
--dev_path data/mixed_dev_dataset.jsonl \
--use_ptuning True \
--pre_seq_len 128 \
--batch_size 1 \
--num_train_epochs 2 \
--save_freq 500 \
--learning_rate 2e-4 \
--logging_steps 100 \
--max_source_seq_len 400 \
--max_target_seq_len 300 \
--save_dir checkpoints_parrallel/ptuning \
--img_log_dir "log/fintune_log" \
--img_log_name "ChatGLM P-Tuning(parallel)"
Under the same data set, the usage time of a single card:
Used 00:27:18.
Multi-card (2 parallel) usage time:
Used 00:13:05.
4. Model prediction
Modify the storage path of the training model and run python inference.py
to test the effect of the trained model:
device = 'cuda:0'
max_new_tokens = 300
model_path = "checkpoints/model_1000" # 模型存放路径
tokenizer = AutoTokenizer.from_pretrained(
model_path,
trust_remote_code=True
)
model = AutoModel.from_pretrained(
model_path,
trust_remote_code=True
).half().to(device)
...
You can also use the Playground we provide to test the model effect:
streamlit run playground_local.py --server.port 8001
Open the corresponding in the browser 机器ip:8001
to access.
5. Annotate the platform
If you need to label your own data, you can do it in the Playground as well.
streamlit run playground_local.py --server.port 8001
Open the corresponding in the browser 机器ip:8001
to access.
Project link: https://github.com/HarderThenHarder/transformers_tasks/blob/main/LLM/chatglm_finetune/readme.md