LLMs multi-task instruction fine-tuning Multi-task instruction fine-tuning

Multi-task fine-tuning is an extension of single-task fine-tuning, where the training dataset includes example inputs and outputs for multiple tasks. Here, the dataset contains examples that guide the model on various tasks, including summarization, review scoring, code translation, and entity recognition.
insert image description here

You train the model on this mixed dataset so that it can improve the model's performance on all tasks simultaneously, avoiding the problem of catastrophic forgetting. Over many iterations of training, the loss computed using the examples is used to update the model's weights, resulting in a guided model that has learned how to be good at many different tasks simultaneously.

A disadvantage of multitasking fine-tuning is that it requires a lot of data. You may need as many as 50-100,000 examples in your training set. However, assembling this data can be very rewarding and worth the effort. The resulting models are often very capable and suitable for situations where good performance is required on multiple tasks.

Let's look at a series of models trained using multi-task guidance for fine-tuning. The variance of the guided model is based on the dataset and tasks used during fine-tuning. An example is the FLAN model family. FLAN, which stands for Fine-tune LAnguage Net, is a specific set of instructions for fine-tuning different models. Because their FLAN fine-tuning is the last step in the training process, the authors of the original paper called it a metaphorical dessert to the main course of pre-training, a very apt name.
insert image description here

FLAN-T5 is the FLAN-guided version of the T5 base model, while FLAN-PALM is the FLAN version of the PLAM base model. you
insert image description here

Got it, FLAN-T5 is a good general guidance model. Overall, it has been fine-tuned on 473 datasets covering 146 task categories.
insert image description here

These datasets were selected from other models and papers, as shown here. Don't worry about reading all the details now. If you are interested, you can access the original paper with a reading exercise after the video and take a closer look.

An example hint dataset that FLAN-T5 uses for summarization tasks is SAMSum. It is part of the Muffin task and dataset used to train language models to summarize conversations.
insert image description here

SAMSum is a dataset of 16,000 messenger-like conversations and summaries. Three examples are shown here, with the dialogue on the left and the summary on the right. Dialogue and Summarization are specifically designed by linguists to generate high-quality language model training datasets.
insert image description here

Linguists were asked to create dialogues similar to those they would write every day, mirroring the topical proportions of their real-life Messenger conversations. Language experts then created short summaries containing important information and the names of people in the conversation.

Here is a prompt template designed to work with this SAMSum conversation summarization dataset. Templates are actually composed of several different directives, all of which basically ask the model to do the same thing. Summarize the conversation. For example,
briefly summarize the conversation.
insert image description here

What is the summary of this conversation?
insert image description here

What happened in that conversation?
insert image description here

Including different ways of saying the same instruction helps the model generalize and perform better. Just like the prompt template you saw earlier. You can see that in each case the conversations from the SAMSum dataset are inserted into the template,
insert image description here

wherever the dialog field appears.

Abstracts are used as labels.
insert image description here

After applying this template to each row of the SAMSum dataset, you can use it to fine-tune the conversation summarization task.

While FLAN-T5 is a general model that exhibits good capabilities in many tasks, you may still find that it has room for improvement on tasks for your specific use case. For example, imagine you're a data scientist building an application to support your customer service team by handling incoming requests via a chatbot, like the one shown here.
insert image description here

Your customer service team needs a summary of each conversation to identify the key actions customers are requesting and determine what action should be taken in response.
insert image description here

The SAMSum dataset endows FLAN-T5 with some ability to summarize conversations. However, the examples in the dataset are mostly about conversations between friends about everyday activities and do not have much overlap with the linguistic structures observed in customer service chats.

You can do additional fine-tuning of the FLAN-T5 model using a dialogue dataset that more closely resembles the conversations that occur with your bot. This is the exact scenario you'll be exploring in this week's lab.

You'll use an additional domain-specific summarization dataset called dialogsum to improve FLAN-T5's ability to summarize supporting chat conversations. This dataset includes over 13,000 support chat conversations and summaries.
insert image description here

The dialogsum dataset is not part of the FLAN-T5 training data, so the model has not seen these dialogs before.

Let's look at an example from dialogsum and discuss how further fine-tuning can improve the model. This is an example from a typical dialogsum dataset where the conversations are between the hotel front desk with customers and staff. Chat has applied a template to include instructions summarizing the conversation at the beginning of the text.
insert image description here

Now, let's see how FLAN-T5 reacts to this hint without any additional fine-tuning, notice that now the hint is compressed on the left so that you have more room to check the completion of the model. This is how the model responds to instructions. You can see that the model was able to recognize that the conversation was about Tommy's booking.
insert image description here

However, it didn't do as well as the human-generated baseline summary, which included important information like Mike's request for easy check-in,
insert image description here

And the completion of the model also discovers information that was not included in the original dialogue. Specifically, the name of the hotel
insert image description here

and the city where it is located.
insert image description here

Now, let's see how the model performs after fine-tuning on the dialogsum dataset, which hopefully you will agree is closer to human-generated summaries. There is no fabricated information, and the summary includes all important details, including the names of the two people involved in the conversation.
insert image description here

This example uses the public dialogsum dataset to demonstrate fine-tuning on custom data.

In practice, you'll get the most bang for your buck by using your company's own internal data for fine-tuning.
insert image description here

For example, a support chat conversation from your customer support application. This will help the model learn how your company likes to summarize conversations and what is most useful to your customer service colleagues.

I know there's a lot to digest here. But don't worry, this example will be explained in the lab. You will have the opportunity to see it all for yourself and try it out.

One thing you need to consider when fine-tuning is how to evaluate the quality of your model completion. In the next video, you'll learn about several metrics and benchmarks that you can use to determine how well your model performs and how much better your fine-tuned version is than the original base model.

reference

https://www.coursera.org/learn/generative-ai-with-llms/lecture/notob/multi-task-instruction-fine-tuning

Guess you like

Origin blog.csdn.net/zgpeace/article/details/132517560