LoRA fine-tuning comes from huggingface official hugging face

Lola  https://huggingface.co/docs/peft/conceptual_guides/lora

 

This concept guide provides a brief overview of LoRA , a technique that speeds up fine-tuning of large models while consuming less memory.

To make fine-tuning more efficient, LoRA uses a low-rank decomposition to represent weight updates in two smaller matrices (called update matrices). These new matrices can be trained to adapt to new data while keeping the total number of changes low. The original weight matrix remains frozen and will not receive any further adjustments. To produce the final result, the original and adjusted weights are combined.

This approach has many advantages:

  • LoRA makes fine-tuning more efficient by drastically reducing the number of trainable parameters.
  • The original pretrained weights remain frozen , which means you can have multiple lightweight and portable LoRA models for various downstream tasks built on top of it.
  • LoRA is orthogonal to, and can be combined with, many other parametrically valid methods. LoRA is orthogonal to many other parameter-efficient methods
  • The performance of the model fine-tuned using LoRA is comparable to that of the fully fine-tuned model.
  • LoRA does not add any inference latency because the adapter weights can be merged with the base model.

In principle, LoRA can be applied to any subset of the weight matrix in a neural network to reduce the number of trainable parameters. However, for simplicity and further parameter efficiency, in Transformer models, LoRA is usually only applied to the attention block. rThe number of trainable parameters in the LoRA model depends on the size of the low-rank update matrix, which is mainly determined by the rank and shape of the original weight matrix .

Common LoRA parameters in PEFT

As with other methods supported by PEFT, to fine-tune a model with LoRA you need:

  1. Instantiate the base model.
  2. Create a configuration (  LoraConfig) where you define LoRA specific parameters.
  3. Wrap the base model get_peft_model()with to get a trainable PeftModel.
  4. PeftModelTrain as you would normally train a base model.

LoraConfigAllows you to control how LoRA is applied to the base model via the following parameters:

  • r: Update the rank of the matrix, denoted by int. A lower rank results in a smaller update matrix and fewer trainable parameters.
  • target_modules: Modules (eg, attention blocks) that apply LoRA update matrices.
  • alpha: LoRA scaling factor . LoRA scaling factor.
  • bias: Specifies whether biasthe parameter should be trained. Can be 'none', 'all'or 'lora_only'.
  • modules_to_save: List of modules, except the LoRA layer , to be made trainable and saved in the final checkpoint. These usually include the model's custom header, which is randomly initialized for the fine-tuning task.
  • layers_to_transform: List of layers for LoRA conversion. If not specified, target_modulesall layers in the transform.
  • layers_patterntarget_modules: If layers_to_transformspecified , pattern to match layer names in. By default, the common layer schemas ( , , etc.) PeftModelare looked at , which are used for singular and custom models. exotic and custom modelslayershblocks

Guess you like

Origin blog.csdn.net/sinat_37574187/article/details/131568265