Two development routes of large language model (LLM): Finetune vs Prompt

foreword

If you are interested in this article, you can click " 【Visitor Must Read-Guide Page】This article includes all high-quality blogs on the homepage " to view the complete blog classification and corresponding links.

In the study of large language models, researchers have two different expectations, which can also be understood as two different routes, specifically:

  • Expectation 1: Become a specialist and solve a certain type of task (translation, abstract acquisition)
  • Expectation 2: To become a generalist, given instructions (Prompt), you can complete the corresponding tasks
    • The earliest research in this direction believes that all NLP tasks can be turned into question-answering tasks, thus unifying all tasks

In the following, we will introduce these two different expectations.


Professional expectations

Specialists have a chance to beat generalists on a single task. For example, in the following papers , ChatGPT performed well on various tasks (the larger the value, the better), but it still couldn't beat the experts.

insert image description here
This corresponds to the use of "experts", that is, fine-tuning the initial domain model, making some modifications to the structure, or fine-tuning some parameters:

  • It is also more consistent with the training process of BERT. The training process is to fill in the blanks of sentences, so the trained model cannot generate complete sentences, and fine-tuning is required for specific scenarios.

insert image description here
As shown below, four BERTs increase the structural modification of the Head so that it can do specific tasks:

insert image description here
And fine-tune the parameters of the model (Finetune), that is, adjust the model parameters with a small amount of data, you can adjust the parameters of the LLM, or you can only adjust the parameters of the newly added structure.

Adapter (Efficient Finetuning) is to add some plug-ins to the large model. When fine-tuning for downstream tasks, you only need to adjust the parameters on the Adapter.
insert image description here


generalist expectations

It is in line with human's imagination of "artificial intelligence", and it is very convenient to develop new tasks. As long as Prompt is redesigned, new functions can be quickly developed, greatly improving efficiency.

For generalists, there are also two types of tasks:

  • [Instruction Learning] Give the description of the topic and let the machine answer it;
  • [In-context Learning] Give examples and let the machine answer other questions.

In-context Learning

[Core task] Give some examples, and then let the machine answer similar questions:

insert image description here

The machine does not appear to be learning from the examples, as shown by the following experimental results:

  • Blue: no examples (very poor)
  • Yellow: Examples are available and correctly labeled (best)
  • Red: There are examples, and the examples are marked randomly (best contrast, slightly decreased)

insert image description here

But the fields of these examples seem to matter, as follows:

  • More than one purple column, the example used is irrelevant to the follow-up question, and the marking is random (performance continues to decline)

insert image description here

Therefore, a guess is that in In-context Learning, the model does not learn in the example, and the role of the example is to activate the model and tell it what field the current task is about, so the number of examples is not very important.

However, new work has emerged in the follow-up. They believe that for very large models, the model can learn from contextual examples, as shown in the following experimental results:

  • The darker the color, the larger the model
  • The horizontal axis is the wrong label ratio, and the vertical axis is the index
  • It can be seen that the more erroneous data, the greater the impact on the performance of the model

insert image description here

Instruction Learning

The model obtained by Word Solitaire training still needs to do some Instruction-tuning, so that it can switch to the corresponding task according to the description of the problem.

Instruction-tuning expects to do something like this:

insert image description here
To do Instruction-tuning, you need to collect various tasks (including annotations), and then rewrite these tasks into instructions, as follows:

insert image description here

Chain of Thought (CoT)

Later, someone found that when in-context learning, given the derivation process, the ability of large model context learning will be strengthened. Further, someone added "Let's think step by step" directly in the prompt, and this simple sentence It also improves the performance of the model.

insert image description here

If you do a chain of thought, the answers generated by the model will be more diverse, so a self consistency method is also proposed accordingly, that is, run multiple times, vote for all the answers that appear, and output the answer that appears most frequently.

Prompt Engineering

There is also a way to let the model find the prompt by itself:

  • Give an example, let the machine find the prompt by itself

insert image description here
The complete method is to give the above example, let the machine find it, find it several times, score each prompt, and then keep the one with the highest score, and continue to input it to LLM, so that it can find similar ones, as shown below:

insert image description here


References

Guess you like

Origin blog.csdn.net/qq_41552508/article/details/130036116