1. Background introduction
Recently, there are more and more open source large models, but for us personally, the cost of training a large model from scratch is too high, so we introduce an efficient fine-tuning framework for large models - PEFT
github address: https://github.com/huggingface/peft/tree/main
PEFT, which stands for Parameter-Efficient Fine-Tuning, is a parameter-efficient fine-tuning library developed by Transform, which can effectively adapt pre-trained language models (PLM) to various downstream applications without fine-tuning all parameters of the model.
2. Supported methods
Currently, PEFT supports the following methods of parameter fine-tuning:
LoRA: LORA: LOW-RANK ADAPTATION OF LARGE LANGUAGE MODELS
Prefix Tuning: Prefix-Tuning: Optimizing Continuous Prompts for Generation, P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Task
P-Tuning: GPT Understands, Too
Prompt Tuning: The Power of Scale for Parameter-Efficient Prompt Tuning
AdaLoRA: Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning
For the introduction of the principles of the above method, we will explain it in several chapters later.
3. Supported models
The list of models currently supported by PEFT is as follows:
3-1、Causal Language Modeling
image.png
3-2、Conditional Generation
image.png
3-3、Sequence Classification
image.png
3-4、Token Classification
image.png
4. Simple use of PEFT
We introduce here, using Lora to train the bigscience/mt0-large model with 1.2B parameters to generate classification labels
4-1、PeftConfig
Each peft method is defined by a PeftConfig class, which stores all important parameters used to build the PeftModel. Here we are fine-tuning the Lora method, so we create a LoraConfig class. The important parameters contained in this class are as follows:
task_type, task type
inference_mode, whether to use the model for inference
r, the dimension of the low-quality matrix
lora_alpha, scaling factor for low-quality matrices
lora_dropout, dropout probability of Lora layer
from peft import LoraConfig, TaskType
peft_config = LoraConfig(task_type=TaskType.SEQ_2_SEQ_LM, inference_mode=False, r=8, lora_alpha=32, lora_dropout=0.1)
4-2、PeftModel
First we need to load the model that needs to be fine-tuned
from transformers import AutoModelForSeq2SeqLM
model_name_or_path = "bigscience/mt0-large"
tokenizer_name_or_path = "bigscience/mt0-large"
model = AutoModelForSeq2SeqLM.from_pretrained(model_name_or_path)
Then, use the get_peft_model() function to create a PeftModel. get_peft_model needs to pass in the fine-tuned model and the corresponding PeftConfig. If we want to know the number of trainable parameters in the model, we can use the print_trainable_parameters method. From the printed results, we can see that we only trained 0.19% of the model parameters. Compared with the original large model, the amount of parameters trained is already very small.
from peft import get_peft_model
model = get_peft_model(model, peft_config)
model.print_trainable_parameters()
"output: trainable params: 2359296 || all params: 1231940608 || trainable%: 0.19151053100118282"
Next, we can use Transformers Trainer, Accelerate or PyTorch training loop to train our own model. After the model training is completed, use the save_pretrained function to save the model to the directory.
model.save_pretrained("output_dir")
After the model is saved, two files are saved. The size of adapter_model.bin is between a few M and dozens of M. This is related to the amount of parameters we trained.
adapter_config.json
adapter_model.bin