Distilling Step-by-Step: You can beat the same level of LLM with less training data and model size!

Distilling Step-by-Step: You can beat the same level of LLM with less training data and model size!

Introduction

The author mentioned that deploying large models has challenges such as time delay, memory, and computing power, so the current trend is to fine-tune and distill a language model that is not very large, such as Vicuna and Alpaca, but it is difficult and expensive to obtain data for specific downstream tasks of.

insert image description here

In order to solve the above problems, the author proposes Distilling Step-by-Step, which can defeat large models on the same data set by using less data and smaller models. (In this article, the author defeated 540BPaLM through the experiment 770M-T5)

Method

insert image description here
Distilling step-by-step is divided into two steps:

  1. Put some unlabeled data through CoT to prompt LLM to generate labels and theoretical basis (that is why such results are obtained).
  2. Finetune the obtained data in the small model.

The first step is as follows:

insert image description here

In this way, the small model can learn how to do this task, how to learn why it is done, and increase the small model's understanding of specific tasks.

Now with xi (from the original unlabeled data), ri (theoretical basis), and yi (label), the author connects the three better:
insert image description here

Enter a question and change the output to answer + answer to solve the problem.

When calculating the loss function, the two are weighted.
insert image description here

experiment

insert image description here

insert image description here

reference

https://arxiv.org/pdf/2305.02301.pdf

Guess you like

Origin blog.csdn.net/qq_18555105/article/details/130490101