Unlock the potential of ChatGLM-6B: optimize large language model training, break through task difficulties and answer parsing problems

Unlock the potential of ChatGLM-6B: optimize large language model training, break through task difficulties and answer parsing problems

LLM (Large Language Model) usually has a lot of prior knowledge, which makes it have good performance in many natural language processing tasks.

However, if you want to directly use LLM to complete some tasks, there will be some difficulties in answer parsing, such as standardizing the output format and strictly obeying the input information.

Therefore, under this project, we refer to the code of ChatGLM-Tuning and try to finetune the large model ChatGLM-6B so that it can better align the output format we need.

1. Environment installation

Since the environment required by ChatGLM is different from that of other experiments in this project, we strongly recommend that you create a new virtual environment to execute all the code in this directory.

Below, we will take Anacondaas an example to show how to quickly build an environment:

  1. Create a virtual environment, you can llm_envmodify to any environment name you want to create:
conda create -n llm_env python=3.8
  1. Activate the newly created virtual environment and install the corresponding dependency packages:
conda activate llm_env
pip install -r requirements.txt
  1. Install the corresponding version peft:
cd peft-chatglm
python setup.py install

2. Dataset preparation

In this experiment, we will try to feed the model with a mixed dataset of 信息抽取+ 文本分类tasks for finetune, and the dataset is in data/mixed_train_dataset.jsonl.

Each piece of data is divided into contextand targettwo parts:

  1. contextPart is to accept user input.

  2. targetThe section is used to specify the output of the model.

Including context2 parts:

  1. Instruction: It is used to inform the model of specific instructions. When a model is required to solve multiple tasks at the same time, different Instructions can be set to help the model determine what task it should do at the moment.

  2. Input: The current user's input.

  • Information Extraction Data Example

The Instruction part tells the model that it needs to do the "reading comprehension" task now, and the Input part tells the model the sentences to be extracted and the format of the output.

{
    
    
    "context": "Instruction: 你现在是一个很厉害的阅读理解器,严格按照人类指令进行回答。\nInput: 找到句子中的三元组信息并输出成json给我:\n\n九玄珠是在纵横中文网连载的一部小说,作者是龙马。\nAnswer: ", 
    "target": "```json\n[{\"predicate\": \"连载网站\", \"object_type\": \"网站\", \"subject_type\": \"网络小说\", \"object\": \"纵横中文网\", \"subject\": \"九玄珠\"}, {\"predicate\": \"作者\", \"object_type\": \"人物\", \"subject_type\": \"图书作品\", \"object\": \"龙马\", \"subject\": \"九玄珠\"}]\n```"
}
  • Text Classification Data Example

The Instruction part tells the model that it needs to do the "reading comprehension" task now, and the Input part tells the model the sentences to be extracted and the format of the output.

{
    
    
    "context": "Instruction: 你现在是一个很厉害的阅读理解器,严格按照人类指令进行回答。\nInput: 下面句子可能是一条关于什么的评论,用列表形式回答:\n\n很不错,很新鲜,快递小哥服务很好,水果也挺甜挺脆的\nAnswer: ", 
    "target": "[\"水果\"]"
}

3. Model training

3.1 Single card training

Two fine-tuning methods, LoRA Finetune and P-Tuning, are supported in the experiment .

Run train.shthe file, and adjust according to the video memory of your own GPU batch_size, max_source_seq_len, max_target_seq_lenparameters:

# LoRA Finetune
python train.py \
    --train_path data/mixed_train_dataset.jsonl \
    --dev_path data/mixed_dev_dataset.jsonl \
    --use_lora True \
    --lora_rank 8 \
    --batch_size 1 \
    --num_train_epochs 2 \
    --save_freq 1000 \
    --learning_rate 3e-5 \
    --logging_steps 100 \
    --max_source_seq_len 400 \
    --max_target_seq_len 300 \
    --save_dir checkpoints/finetune \
    --img_log_dir "log/fintune_log" \
    --img_log_name "ChatGLM Fine-Tune" \
    --device cuda:0


# P-Tuning
python train.py \
    --train_path data/mixed_train_dataset.jsonl \
    --dev_path data/mixed_dev_dataset.jsonl \
    --use_ptuning True \
    --pre_seq_len 128 \
    --batch_size 1 \
    --num_train_epochs 2 \
    --save_freq 200 \
    --learning_rate 2e-4 \
    --logging_steps 100 \
    --max_source_seq_len 400 \
    --max_target_seq_len 300 \
    --save_dir checkpoints/ptuning \
    --img_log_dir "log/fintune_log" \
    --img_log_name "ChatGLM P-Tuning" \
    --device cuda:0

After successfully running the program, you will see the following interface:

...
global step 900 ( 49.89% ) , epoch: 1, loss: 0.78065, speed: 1.25 step/s, ETA: 00:12:05
global step 1000 ( 55.43% ) , epoch: 2, loss: 0.71768, speed: 1.25 step/s, ETA: 00:10:44
Model has saved at checkpoints/model_1000.
Evaluation Loss: 0.17297
Min eval loss has been updated: 0.26805 --> 0.17297
Best model has saved at checkpoints/model_best.
global step 1100 ( 60.98% ) , epoch: 2, loss: 0.66633, speed: 1.24 step/s, ETA: 00:09:26
global step 1200 ( 66.52% ) , epoch: 2, loss: 0.62207, speed: 1.24 step/s, ETA: 00:08:06
...

log/finetune_logBelow you will see the graph of the training loss:

3.2 Doka training

Run train_multi_gpu.shthe file, CUDA_VISIBLE_DEVICESspecify available graphics cards by specifying num_processesthe number of graphics cards to use:

# LoRA Finetune
CUDA_VISIBLE_DEVICES=0,1 accelerate launch --multi_gpu --mixed_precision=fp16 --num_processes=2 train_multi_gpu.py \
    --train_path data/mixed_train_dataset.jsonl \
    --dev_path data/mixed_dev_dataset.jsonl \
    --use_lora True \
    --lora_rank 8 \
    --batch_size 1 \
    --num_train_epochs 2 \
    --save_freq 500 \
    --learning_rate 3e-5 \
    --logging_steps 100 \
    --max_source_seq_len 400 \
    --max_target_seq_len 300 \
    --save_dir checkpoints_parrallel/finetune \
    --img_log_dir "log/fintune_log" \
    --img_log_name "ChatGLM Fine-Tune(parallel)"


# P-Tuning
CUDA_VISIBLE_DEVICES=0,1 accelerate launch --multi_gpu --mixed_precision=fp16 --num_processes=2 train_multi_gpu.py \
    --train_path data/mixed_train_dataset.jsonl \
    --dev_path data/mixed_dev_dataset.jsonl \
    --use_ptuning True \
    --pre_seq_len 128 \
    --batch_size 1 \
    --num_train_epochs 2 \
    --save_freq 500 \
    --learning_rate 2e-4 \
    --logging_steps 100 \
    --max_source_seq_len 400 \
    --max_target_seq_len 300 \
    --save_dir checkpoints_parrallel/ptuning \
    --img_log_dir "log/fintune_log" \
    --img_log_name "ChatGLM P-Tuning(parallel)"

Under the same data set, the usage time of a single card:

Used 00:27:18.

Multi-card (2 parallel) usage time:

Used 00:13:05.

4. Model prediction

Modify the storage path of the training model and run python inference.pyto test the effect of the trained model:

device = 'cuda:0'
max_new_tokens = 300
model_path = "checkpoints/model_1000"           # 模型存放路径

tokenizer = AutoTokenizer.from_pretrained(
    model_path, 
    trust_remote_code=True
)

model = AutoModel.from_pretrained(
    model_path,
    trust_remote_code=True
).half().to(device)
...

You can also use the Playground we provide to test the model effect:

streamlit run playground_local.py --server.port 8001

Open the corresponding in the browser 机器ip:8001to access.

5. Annotate the platform

If you need to label your own data, you can do it in the Playground as well.

streamlit run playground_local.py --server.port 8001

Open the corresponding in the browser 机器ip:8001to access.

Project link: https://github.com/HarderThenHarder/transformers_tasks/blob/main/LLM/chatglm_finetune/readme.md

Guess you like

Origin blog.csdn.net/sinat_39620217/article/details/132457551