Use Transformers' Trainer to fine-tune pre-trained large models in PyTorch

background

Transformers provides a very convenient API to fine-tune large models. Let’s talk about the steps of using Trainer to fine-tune large models.

Step 1: Load the pre-trained large model

from transformers import AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased")

Step 2: Set training hyperparameters

from transformers import TrainingArguments

training_args = TrainingArguments(
    output_dir="path/to/save/folder/",
    learning_rate=2e-5,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    num_train_epochs=2,
)

For example, this set epoch equal to 2

Step 3: Get the tokenizer

from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")

Step 4: Load the dataset

from datasets import load_dataset

dataset = load_dataset("rotten_tomatoes")  # doctest: +IGNORE_RESULT

Step 5: Create a word segmentation function and specify the fields in the data set that need to be segmented:

def tokenize_dataset(dataset):
    return tokenizer(dataset["text"])

Step 6: Call map() to apply the word segmentation function to the entire data set

dataset = dataset.map(tokenize_dataset, batched=True)

Step 7: Use DataCollatorWithPadding to fill data in batches to speed up the filling process:

from transformers import DataCollatorWithPadding

data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

Step 8: Initialize Trainer

from transformers import Trainer

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=dataset["train"],
    eval_dataset=dataset["test"],
    tokenizer=tokenizer,
    data_collator=data_collator,
)  # doctest: +SKIP

Step 9: Start training

trainer.train()

Summarize:

Using the API provided by Trainer, you can fine-tune a large model in just nine simple steps and a dozen lines of code. Do you want to give it a try?

Guess you like

Origin blog.csdn.net/duzm200542901104/article/details/133081182