Efficient fine-tuning of large models - introduction to the PEFT framework

1. Background introduction

Recently, there are more and more open source large models, but for us personally, the cost of training a large model from scratch is too high, so we introduce an efficient fine-tuning framework for large models - PEFT

github address: https://github.com/huggingface/peft/tree/main

PEFT, which stands for Parameter-Efficient Fine-Tuning, is a parameter-efficient fine-tuning library developed by Transform, which can effectively adapt pre-trained language models (PLM) to various downstream applications without fine-tuning all parameters of the model.

2. Supported methods

Currently, PEFT supports the following methods of parameter fine-tuning:

LoRA: LORA: LOW-RANK ADAPTATION OF LARGE LANGUAGE MODELS

Prefix Tuning: Prefix-Tuning: Optimizing Continuous Prompts for GenerationP-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Task

P-Tuning: GPT Understands, Too

Prompt Tuning: The Power of Scale for Parameter-Efficient Prompt Tuning

AdaLoRA: Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning

For the introduction of the principles of the above method, we will explain it in several chapters later.

3. Supported models

The list of models currently supported by PEFT is as follows:

3-1、Causal Language Modeling

image.png

3-2、Conditional Generation

image.png

3-3、Sequence Classification

image.png

3-4、Token Classification

image.png

4. Simple use of PEFT

We introduce here, using Lora to train the bigscience/mt0-large model with 1.2B parameters to generate classification labels

4-1、PeftConfig

Each peft method is defined by a PeftConfig class, which stores all important parameters used to build the PeftModel. Here we are fine-tuning the Lora method, so we create a LoraConfig class. The important parameters contained in this class are as follows:

task_type, task type

inference_mode, whether to use the model for inference

r, the dimension of the low-quality matrix

lora_alpha, scaling factor for low-quality matrices

lora_dropout, dropout probability of Lora layer

from peft import LoraConfig, TaskType

peft_config = LoraConfig(task_type=TaskType.SEQ_2_SEQ_LM, inference_mode=False, r=8, lora_alpha=32, lora_dropout=0.1)

4-2、PeftModel

First we need to load the model that needs to be fine-tuned

from transformers import AutoModelForSeq2SeqLM

model_name_or_path = "bigscience/mt0-large"
tokenizer_name_or_path = "bigscience/mt0-large"
model = AutoModelForSeq2SeqLM.from_pretrained(model_name_or_path)

Then, use the get_peft_model() function to create a PeftModel. get_peft_model needs to pass in the fine-tuned model and the corresponding PeftConfig. If we want to know the number of trainable parameters in the model, we can use the print_trainable_parameters method. From the printed results, we can see that we only trained 0.19% of the model parameters. Compared with the original large model, the amount of parameters trained is already very small.

from peft import get_peft_model

model = get_peft_model(model, peft_config)
model.print_trainable_parameters()
"output: trainable params: 2359296 || all params: 1231940608 || trainable%: 0.19151053100118282"

Next, we can use Transformers Trainer, Accelerate or PyTorch training loop to train our own model. After the model training is completed, use the save_pretrained function to save the model to the directory.

model.save_pretrained("output_dir")

After the model is saved, two files are saved. The size of adapter_model.bin is between a few M and dozens of M. This is related to the amount of parameters we trained.

adapter_config.json
adapter_model.bin

Guess you like

Origin blog.csdn.net/qq_38563206/article/details/133085084