LLMs instruction fine-tuning Instruction fine-tuning

Last week, you were introduced to the lifecycle of a generative AI project. You explored example use cases for large language models and discussed the types of tasks they are capable of performing.
insert image description here

In this lesson, you will learn how to improve the performance of existing models for specific use cases.
insert image description here

You'll also learn important metrics that can be used to evaluate the performance of your fine-tuned LLM and quantify its improvement over the base model you started with.
insert image description here

Let's first discuss how to use instruction hints to fine-tune the LLM. Earlier in the course, you saw that some models were able to recognize instructions contained in hints and correctly perform zero-shot inference,
insert image description here

And some other smaller LLMs may not be able to perform the task, like the example shown here.insert image description here

You also saw that including one or more examples of what you want the model to do (known as one-shot or few-shot inference) might be enough to help the model identify the task and generate a good completion.
insert image description here

However, this strategy has several disadvantages.

  1. First, with smaller models, even including five or six examples, it doesn't always work.
  2. Second, any examples you include in the prompt take up valuable space in the context window, reducing the space you have to include other useful information.
    insert image description here

Fortunately, another solution exists where you can further train the base model using a process called fine-tuning.
insert image description here

In contrast to pre-training for training LLMs via self-supervised learning using large amounts of unstructured text data,
insert image description here

Fine-tuning is a supervised learning process where you update the weights of the LLM using a dataset of labeled examples.
insert image description here

These markup examples are prompt completion pairs,
insert image description here

The fine-tuning process extends the training of the model to improve its ability to generate good completions on a specific task.
insert image description here

A strategy called instruction fine-tuning has been particularly effective at improving the performance of models on a variety of tasks.

Let's take a closer look at how this works, instruction fine-tuning trains a model using examples that demonstrate how it should respond to specific instructions. Here are a few example prompts to demonstrate the idea. The instruction in both examples is "categorize this comment", and the expected completion is a text string starting with "sentiment" followed by "positive" or "negative".
insert image description here

The dataset you use for training includes many pairs of examples of prompt completions for the task of your interest, each example including an instruction.

For example, if you wanted to fine-tune a model to improve its ability to summarize, you would build a dataset of examples beginning with "summarize the following text" or a similar phrase. If you were improving your model's translation skills, your examples would include instructions like "translate this sentence". These prompt completion examples allow the model to learn to generate responses that follow a given instruction.
insert image description here

Instruction fine-tuning, where all model weights are updated, is called global fine-tuning. This process produces a new version of the model with updated weights. It is important to note that, just like pre-training, full fine-tuning requires sufficient memory and computational budget to store and process all gradients, optimizers, and other components that are updated during training. Therefore, you can benefit from the memory optimization and parallel computing strategies you learned last week.
insert image description here

So how do you actually do instruction fine-tuning and LLM? The first step is to prepare your training data. There are many publicly available datasets that have been used to train early generations of language models, although most of them are not formatted as instructions. Fortunately, developers have assembled hint template libraries that can be used to take existing datasets, such as the large dataset of Amazon product reviews, and convert them into instruction hint datasets for fine-tuning.

The prompt template library includes many templates for different tasks and different datasets. Here are three hints that are designed to work with the Amazon reviews dataset and can be used to fine-tune models for classification, text generation, and text summarization tasks.
insert image description here

You can see that in each case you pass the original review (here called review_body) to the template where it is inserted into the A short sentence description of the product review" in the text that begins the directive. The result is a prompt that now contains instructions and examples from the dataset.
insert image description here

Once you have prepared the instruction dataset, just like standard supervised learning, you divide the dataset into train, validation, and test splits.
insert image description here

During fine-tuning, you select cues from the training dataset and pass them to the LLM, and the generation is complete. Next, you compare the LLM's completion to the responses specified in the training data. You can see here that the model doesn't do a very good job of classifying the review as neutral, which is a bit of an exaggeration. Reviews are obviously very positive. Remember, the output of the LLM is a probability distribution across tokens.
insert image description here

So you can compare the finished distribution and the distribution of the training labels and use the standard crossentropy function to calculate the loss between the two distributions of tokens.
insert image description here

The calculated loss is then used to update the model weights, with standard backpropagation. You'll do this for many batches of hint completion pairs, and update the weights over several epochs so that the model's performance on the task improves.
insert image description here

As with standard supervised learning, you can define a separate evaluation step to measure your LLM performance using a holdout validation dataset. This will give you the validation accuracy,
insert image description here

And after you finish fine-tuning, you can use the hold-out test dataset for final performance evaluation. This will give you the test accuracy.insert image description here

The fine-tuning process results in a new version of the base model that is better at the task you are interested in, often called a mentored model. Fine-tuning using instruction hints is by far the most common way to fine-tune LLMs. From this point on, when you hear or see the word "fine-tuning," you can assume it always means instruction fine-tuning.
insert image description here

reference

https://www.coursera.org/learn/generative-ai-with-llms/lecture/exyNC/instruction-fine-tuning

Guess you like

Origin blog.csdn.net/zgpeace/article/details/132484646