[NLP]LLM---FineTune's own Llama2 model

A data set preparation

Let’s talk a bit about the parameters we can tune here. First, we want to load a llama-2-7b-hf model and train it on the mlabonne/guanaco-llama2-1k (1,000 samples), which will produce our fine-tuned model llama-2-7b-miniguanaco. If you’re interested in how this dataset was created, you can check this notebook. Feel free to change it: there are many good datasets on the Hugging Face Hub, like databricks/databricks-dolly-15k.

Use the following code to prepare offline data sets (GPU machines cannot be connected to the Internet)

import os
from datasets import load_from_disk, load_dataset

print("loading dataset...")
dataset_name = "mlabonne/guanaco-llama2-1k"
dataset = load_dataset(path=dataset_name, split="train", download_mode="reuse_dataset_if_exists")
print(dataset)

offline_dataset_path = "./guanaco-llama2-1k-offline"
os.makedirs(offline_dataset_path, exist_ok=True)

print("save to disk...")
dataset.save_to_disk('./guanaco-llama2-1k-offline')
print("load from disk")
dataset = load_from_disk("./guanaco-llama2-1k-offline")
print(dataset)

<s>[INST] What should I do if the 32-bit application has insufficient memory[/INST] 1. Delete unnecessary data and applications: Intelligent Manager (Memory Manager) - Storage Space/Memory. \n2. Close unnecessary applications: Click the recent tasks key in the lower left corner of the screen - click the "X" in the upper right corner of the application you want to close/click "Close All" below. \n3. If the phone supports memory cards, move the multimedia files to the expansion card or set the shooting storage path to the expansion card. If you do not have a memory card, it is recommended to move multimedia files to your computer regularly. \n4. Uninstall software or files that are not frequently used. </s>

['<s>[INST] How should I comfort myself when I am broken up in love? [/INST] Falling out of love is a very painful experience, but in this process, you need to learn how to comfort yourself to help yourself recover. Here are some suggestions to help you comfort yourself:\n\n1. Allow yourself to feel the pain: Falling out of love is a normal emotional reaction, and you don't need to hide or suppress your feelings. Allow yourself to slowly face this fact and accept your emotions. \n\n2. Find support: Find friends, family members or psychological counselors who you can talk to and let them help you resolve your emotions and listen to your worries. Don't face this problem alone. \n\n3. Do something that makes you feel good: find something that makes you happy, such as exercising, reading, listening to music, traveling, etc. These activities can help distract you and reduce pain. \n\n4. Pay attention to your own growth: Falling out of love is an opportunity for growth. It allows you to better understand yourself and learn how to deal with emotions and interpersonal relationships. Pay attention to your own growth and don’t forget that this process is also meaningful. \n\n5. Give yourself time: Falling out of love takes time to heal, don't force yourself to recover soon. Give yourself enough time to process the process and slowly regain your mindset. \n\nBeing broken up is a painful experience, but it is also an opportunity to learn and grow. I believe that I can overcome this difficulty and the future will be better. </s><s>[INST] Can you comfort me who is heartbroken? [/INST] Speaking of which, you may be disappointed. As an AI language model that processes text, although I can give you some text answers based on what you express, I do not have the real ability to understand real events. , and the ability to empathize with real people. But I will be happy to provide you with some comfort and support, if you are willing to share your feelings and thoughts, and let me know what kind of help and support you need, I will do my best to help you through this sad time period. </s>']

import pandas as pd
from datasets import load_dataset
# Initialize lists to store prompts and responses
prompts = []
responses = []

prev_examples = []
# Parse out prompts and responses from examples
for example in prev_examples:
  try:
    split_example = example.split('-----------')
    prompts.append(split_example[1].strip())
    responses.append(split_example[3].strip())
  except:
    pass

# Create a DataFrame
df = pd.DataFrame({
    'prompt': prompts,
    'response': responses
})

# Remove duplicates
df = df.drop_duplicates()

print('There are ' + str(len(df)) + ' successfully-generated examples. Here are the first few:')

df.head()

# Split the data into train and test sets, with 90% in the train set
train_df = df.sample(frac=0.9, random_state=42)
test_df = df.drop(train_df.index)

# Save the dataframes to .jsonl files
train_df.to_json('train.jsonl', orient='records', lines=True)
test_df.to_json('test.jsonl', orient='records', lines=True)

# Load datasets
train_dataset = load_dataset('json', data_files='/content/train.jsonl', split="train")
valid_dataset = load_dataset('json', data_files='/content/test.jsonl', split="train")

# Preprocess datasets
train_dataset_mapped = train_dataset.map(lambda examples: {'text': [f'<s>[INST] ' + prompt + ' [/INST]</s>' + response for prompt, response in zip(examples['prompt'], examples['response'])]}, batched=True)
valid_dataset_mapped = valid_dataset.map(lambda examples: {'text': [f'<s>[INST] ' + prompt + ' [/INST]</s>' + response for prompt, response in zip(examples['prompt'], examples['response'])]}, batched=True)

# trainer = SFTTrainer(
#     model=model,
#     train_dataset=train_dataset_mapped,
#     eval_dataset=valid_dataset_mapped,  # Pass validation dataset here
#     peft_config=peft_config,
#     dataset_text_field="text",
#     max_seq_length=max_seq_length,
#     tokenizer=tokenizer,
#     args=training_arguments,
#     packing=packing,
# )

If it is your own data set, you first need to process the training data, because the training data includes two columns, namely prompt and response. We need to combine the text of the two columns together and distinguish them through formatting characters, such as the following format function:

{'text': ['[INST]  + prompt + ' [/INST] ' + response)

2. Start fine-tuning Llama2

In this section, we will fine-tune a Llama 2 model with 7 billion parameters on a A800 GPU(90G) with high RAM. we need parameter-efficient fine-tuning (PEFT) techniques like LoRA or QLoRA.

QLoRA will use a rank of 64 with a scaling parameter of 16 (see this article for more information about LoRA parameters). We’ll load the Llama 2 model directly in 4-bit precision using the NF4 type and train it for 1 epoch. To get more information about the other parameters, check the TrainingArgumentsPeftModel, and SFTTrainer documentation.

import os
import torch
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    HfArgumentParser,
    TrainingArguments,
    pipeline,
    logging,
)
from peft import LoraConfig, PeftModel
from trl import SFTTrainer

# Load dataset (you can process it here)

# from datasets import load_dataset
#
# print("loading dataset")
# dataset_name = "mlabonne/guanaco-llama2-1k"
# dataset = load_dataset(dataset_name, split="train")
# dataset.save_to_disk('./guanaco-llama2-1k-offline')

from datasets import load_from_disk

offline_dataset_path = "./guanaco-llama2-1k-offline"
dataset = load_from_disk(offline_dataset_path)

# The model that you want to train from the Hugging Face hub
model_name = "/home/work/llama-2-7b"

# Fine-tuned model name
new_model = "llama-2-7b-miniguanaco"

################################################################################
# QLoRA parameters
################################################################################

# LoRA attention dimension
lora_r = 64

# Alpha parameter for LoRA scaling
lora_alpha = 16

# Dropout probability for LoRA layers
lora_dropout = 0.1

################################################################################
# bitsandbytes parameters
################################################################################

# Activate 4-bit precision base model loading
use_4bit = True

# Compute dtype for 4-bit base models
bnb_4bit_compute_dtype = "float16"

# Quantization type (fp4 or nf4)
bnb_4bit_quant_type = "nf4"

# Activate nested quantization for 4-bit base models (double quantization)
use_nested_quant = False

################################################################################
# TrainingArguments parameters
################################################################################

# Output directory where the model predictions and checkpoints will be stored
output_dir = "./results"

# Number of training epochs
num_train_epochs = 1

# Enable fp16/bf16 training (set bf16 to True with an A100)
fp16 = False
bf16 = False

# Batch size per GPU for training
per_device_train_batch_size = 16

# Batch size per GPU for evaluation
per_device_eval_batch_size = 16

# Number of update steps to accumulate the gradients for
gradient_accumulation_steps = 1

# Enable gradient checkpointing
gradient_checkpointing = True

# Maximum gradient normal (gradient clipping)
max_grad_norm = 0.3

# Initial learning rate (AdamW optimizer)
learning_rate = 2e-4

# Weight decay to apply to all layers except bias/LayerNorm weights
weight_decay = 0.001

# Optimizer to use
optim = "paged_adamw_32bit"

# Learning rate schedule
lr_scheduler_type = "cosine"

# Number of training steps (overrides num_train_epochs)
max_steps = -1

# Ratio of steps for a linear warmup (from 0 to learning rate)
warmup_ratio = 0.03

# Group sequences into batches with same length
# Saves memory and speeds up training considerably
group_by_length = True

# Save checkpoint every X updates steps
save_steps = 0

# Log every X updates steps
logging_steps = 25

################################################################################
# SFT parameters
################################################################################

# Maximum sequence length to use
max_seq_length = None

# Pack multiple short examples in the same input sequence to increase efficiency
packing = False

# Load the entire model on the GPU 0
device_map = {"": 0}

# Load tokenizer and model with QLoRA configuration
compute_dtype = getattr(torch, bnb_4bit_compute_dtype)

bnb_config = BitsAndBytesConfig(
    load_in_4bit=use_4bit,
    bnb_4bit_quant_type=bnb_4bit_quant_type,
    bnb_4bit_compute_dtype=compute_dtype,
    bnb_4bit_use_double_quant=use_nested_quant,
)

# Check GPU compatibility with bfloat16
if compute_dtype == torch.float16 and use_4bit:
    major, _ = torch.cuda.get_device_capability()
    if major >= 8:
        print("=" * 80)
        print("Your GPU supports bfloat16: accelerate training with bf16=True")
        print("=" * 80)

# Load base model
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    device_map=device_map
)
model.config.use_cache = False
model.config.pretraining_tp = 1

# Load LLaMA tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"  # Fix weird overflow issue with fp16 training

# Load LoRA configuration
peft_config = LoraConfig(
    lora_alpha=lora_alpha,
    lora_dropout=lora_dropout,
    r=lora_r,
    bias="none",
    task_type="CAUSAL_LM",
)

# Set training parameters
training_arguments = TrainingArguments(
    output_dir=output_dir,
    num_train_epochs=num_train_epochs,
    per_device_train_batch_size=per_device_train_batch_size,
    gradient_accumulation_steps=gradient_accumulation_steps,
    optim=optim,
    save_steps=save_steps,
    logging_steps=logging_steps,
    learning_rate=learning_rate,
    weight_decay=weight_decay,
    fp16=fp16,
    bf16=bf16,
    max_grad_norm=max_grad_norm,
    max_steps=max_steps,
    warmup_ratio=warmup_ratio,
    group_by_length=group_by_length,
    lr_scheduler_type=lr_scheduler_type
)

# Set supervised fine-tuning parameters
trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    peft_config=peft_config,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    tokenizer=tokenizer,
    args=training_arguments,
    packing=packing,
)

# Train model
trainer.train()

# Save trained model
trainer.model.save_pretrained(new_model)

We can now load everything and start the fine-tuning process. We’re relying on multiple wrappers, so bear with me.

  • First of all, we want to load the dataset we defined. Here, our dataset is already preprocessed but, usually, this is where you would reformat the prompt, filter out bad text, combine multiple datasets, etc.
  • Then, we’re configuring bitsandbytes for 4-bit quantization.
  • Next, we’re loading the Llama 2 model in 4-bit precision on a GPU with the corresponding tokenizer.
  • Finally, we’re loading configurations for QLoRA, regular training parameters, and passing everything to the SFTTrainer. The training can finally start!

3. Merge lora weights and save the complete model

import os
import torch
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer
)
from peft import PeftModel

# Save trained model

model_name = "/home/work/llama-2-7b"

# Load the entire model on the GPU 0
device_map = {"": 0}

new_model = "llama-2-7b-miniguanaco"

# Reload model in FP16 and merge it with LoRA weights
base_model = AutoModelForCausalLM.from_pretrained(
    model_name,
    low_cpu_mem_usage=True,
    return_dict=True,
    torch_dtype=torch.float16,
    device_map=device_map,
)
model = PeftModel.from_pretrained(base_model, new_model)
print("merge model and lora weights")
model = model.merge_and_unload()

output_merged_dir = "final_merged_checkpoint2"
os.makedirs(output_merged_dir, exist_ok=True)
print("save model")
model.save_pretrained(output_merged_dir, safe_serialization=True)

# Reload tokenizer to save it
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

print("save tokenizer")
tokenizer.save_pretrained(output_merged_dir)

Load fine-tuned model for conversation

After training for one epoch using 2000 Alpaca Chinese words, the answer is as follows: If the question also starts with [INST] [/INST], the answer will be accurate.

import torch 
from transformers import ( 
    AutoModelForCausalLM, 
    AutoTokenizer, 
) 

# The model that you want to train from the Hugging Face hub 
model_name = "./final_merged_alpaca" 

device_map = {"": 0} 

# Reload the model in FP16 
model = AutoModelForCausalLM. from_pretrained( 
    model_name, 
    low_cpu_mem_usage=True, 
    return_dict=True, 
    torch_dtype=torch.float16, 
    device_map=device_map, 
) 

# Reload the tokenizer to save it 
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True) 

from transformers import pipeline 
 
# prompt = f"[INST] Where is the capital of China? [/INST]" # Replace the command here with the command related to your task
# num_new_tokens = 100 # Change to the number of new tokens you want to generate 
# # Count the number of tokens in the prompt
# num_prompt_tokens = len(tokenizer(prompt)['input_ids']) 
# # Calculate the maximum length of a generation 
# max_length = num_prompt_tokens + num_new_tokens 

gen = pipeline('text-generation', model=model, tokenizer=tokenizer, max_length=256) 

print("############################################# ###############") 
prompt1 = "[INST] Where is the capital of China?[/INST]" # Replace the command here with the command related to your task 
result1 = gen(prompt1) 
print(prompt1) 
print(result1[0]['generated_text']) 

print("######################### ####################################") 
prompt2 = "Where is the capital of China?" # Replace the command here with the one relevant to your task 
result2 = gen(prompt2) 
print(result2[0]['generated_text'])

Before fine-tuning: 

Four summary

In this article, we saw how to fine-tune a Llama-2-7b model. We introduced some necessary background on LLM training and fine-tuning, as well as important considerations related to instruction datasets. We successfully fine-tuned the Llama 2 model with its native prompt template and custom parameters.

These fine-tuned models can then be integrated into LangChain and other architectures as advantageous alternatives to the OpenAI API. Remember, in this new paradigm, instruction datasets are the new gold, and the quality of your model heavily depends on the data on which it’s been fine-tuned. So, good luck with building high-quality datasets!

ML Blog - Fine-Tune Your Own Llama 2 Model in a Colab Notebook (mlabonne.github.io)

Extended Guide: Instruction-tune Llama 2 (philschmid.de)

How to fine-tune LLaMA 2 using SFT, LORA (accubits.com)Fine-Tuning LLaMA 2 Models using a single GPU, QLoRA and AI Notebooks - OVHcloud Blog

GPT-LLM-Trainer: How to fine-tune and train LLM easily and quickly using your own data_Technology Frenzy AI's Blog-CSDN Blog

Guess you like

Origin blog.csdn.net/zwqjoy/article/details/132765488