Lola https://huggingface.co/docs/peft/conceptual_guides/lora
This concept guide provides a brief overview of LoRA , a technique that speeds up fine-tuning of large models while consuming less memory.
To make fine-tuning more efficient, LoRA uses a low-rank decomposition to represent weight updates in two smaller matrices (called update matrices). These new matrices can be trained to adapt to new data while keeping the total number of changes low. The original weight matrix remains frozen and will not receive any further adjustments. To produce the final result, the original and adjusted weights are combined.
This approach has many advantages:
- LoRA makes fine-tuning more efficient by drastically reducing the number of trainable parameters.
- The original pretrained weights remain frozen , which means you can have multiple lightweight and portable LoRA models for various downstream tasks built on top of it.
- LoRA is orthogonal to, and can be combined with, many other parametrically valid methods. LoRA is orthogonal to many other parameter-efficient methods
- The performance of the model fine-tuned using LoRA is comparable to that of the fully fine-tuned model.
- LoRA does not add any inference latency because the adapter weights can be merged with the base model.
In principle, LoRA can be applied to any subset of the weight matrix in a neural network to reduce the number of trainable parameters. However, for simplicity and further parameter efficiency, in Transformer models, LoRA is usually only applied to the attention block. r
The number of trainable parameters in the LoRA model depends on the size of the low-rank update matrix, which is mainly determined by the rank and shape of the original weight matrix .
Common LoRA parameters in PEFT
As with other methods supported by PEFT, to fine-tune a model with LoRA you need:
- Instantiate the base model.
- Create a configuration (
LoraConfig
) where you define LoRA specific parameters. - Wrap the base model
get_peft_model()
with to get a trainablePeftModel
. PeftModel
Train as you would normally train a base model.
LoraConfig
Allows you to control how LoRA is applied to the base model via the following parameters:
r
: Update the rank of the matrix, denoted byint
. A lower rank results in a smaller update matrix and fewer trainable parameters.target_modules
: Modules (eg, attention blocks) that apply LoRA update matrices.alpha
: LoRA scaling factor . LoRA scaling factor.bias
: Specifies whetherbias
the parameter should be trained. Can be'none'
,'all'
or'lora_only'
.modules_to_save
: List of modules, except the LoRA layer , to be made trainable and saved in the final checkpoint. These usually include the model's custom header, which is randomly initialized for the fine-tuning task.layers_to_transform
: List of layers for LoRA conversion. If not specified,target_modules
all layers in the transform.layers_pattern
target_modules
: Iflayers_to_transform
specified , pattern to match layer names in. By default, the common layer schemas ( , , etc.)PeftModel
are looked at , which are used for singular and custom models. exotic and custom modelslayers
h
blocks