Lion: Adversarial Distillation for Closed Source Large Language Models

Lion: Adversarial Distillation for Closed Source Large Language Models

Lion , an anti-distillation framework for closed-source large language models proposed by the Hong Kong University of Science and Technology, successfully transferred the knowledge of ChatGPT to the LLaMA model (named Lion) with a parameter volume of 7B, and achieved nearly 95% of ChatGPT's ability approximation with only 70k training data. Moreover, the generality of the framework makes it not only applicable to distilling ChatGPT, but also conveniently applicable to other closed-source LLMs.

Essay topic:

Lion: Adversarial Distillation for Closed Source Large Language Models

Lion: Adversarial Distillation of Closed-Source Large Language Model

Paper link:

https://arxiv.org/abs/2305.12870

project address:

https://github.com/YJiangcm/Lion

Method overview

Specifically, the author designed the prompt to let the closed-source LLM act as a "referee"  Referee  to identify difficult instructions with significant performance gaps between the teacher's answer and the student's answer. Moreover, the author designed the prompt to let the closed-source LLM act as a "generator"  Generator  to generate new instructions that simulate the data distribution corresponding to the identified difficult instructions. The proposed anti-distillation framework is shown in the figure below, and each round of iteration includes three stages:

1) The imitation phase, for a set of instructions, aligns the student's responses with the teacher's;

2) Differentiate between stages and identify difficult instructions;

3) In the generation phase, according to the identified difficult instructions, new difficult instructions are generated to increase the challenge to the student model.

Considering that the student model may suffer from catastrophic forgetting during the learning process, the authors also generate an equal number of new simple instructions to increase the diversity of the training data. For details, please refer to the original paper:

https://arxiv.org/abs/2305.12870

Essentially, this adversarial framework forms a positive feedback loop, effectively improving the capabilities of the student model.

image

Experimental results

In order to verify the effectiveness of the method, the author applies the proposed anti-distillation framework to the well-known closed-source large language model ChatGPT, and transfers its knowledge to an open-source basic pre-training model LLaMA, which consists of 7 billion parameters. The author chose Alpaca's training data (generated from only 175 manually selected seed instructions) as the initial training instruction, and conducted 3 iterations of AKD, using a total of 70K instruction-following data for training. The final trained model is named  Lion .

The author selected a series of previous works including LLaMA, Alpaca, Vicuna and WizardLM as the baseline. For fair comparison, the parameters of the models are set to 7B. Following previous work, the authors used two evaluation methods: 1) automatic evaluation using GPT-4; 2) human evaluation based on "alignment criteria".

**3.1 Automatic Evaluation with GPT-**4

Based on previous research, GPT-4 has the potential to generate highly consistent rankings and comprehensive evaluations when comparing chatbot responses. Here, the authors leverage GPT-4 to automatically score the quality of the responses of two models on 80 Vicuna-Instructions (scored from 1 to 10). The author selects the answer of ChatGPT as a reference, compares ChatGPT and other models in pairs, and obtains the overall answer quality of different models relative to ChatGPT by calculating the ratio of the sum of scores.

As shown in the figure below, Lion (7B) improves the relative score by at least  5.45 % compared to other baseline models, and is close to ChatGPT 's 94.74 % reply quality.

image

In order to comprehensively compare Lion's ability to generate high-quality responses with other baseline models, the authors plotted the relative response quality on different task categories, as shown in the figure below. Notably, Lion slightly outperforms ChatGPT in the general, knowledge, commonsense, and counterfactual task categories. In addition, Lion outperforms baseline models by at least 26.67% on math tasks and most of them on code generation tasks.

image

3.2 Human Evaluation with Alignment Criteria

To evaluate the alignment quality of LLMs, the authors followed the 3H criteria adopted by previous studies: only models with helpful, honest, and harmless (HHH) features are considered aligned. These standards are used to measure how well artificial intelligence (AI) systems align with human values.

The authors performed human evaluations on 252 User-Oriented-Instructions and compared the frequency of wins, draws, and losses between Lion and different models in the figure below. The results of human evaluation show that the answers generated by Lion outperform other baseline models except ChatGPT. Specifically, Lion wins 81 of 252 user commands and loses only 58 commands compared to WizardLM. These findings suggest that the authors' proposed framework makes Lion highly efficient at learning various instructions.

image

in conclusion

The article proposes an innovative adversarial knowledge distillation (AKD) framework for distilling a closed-source large language model (LLM) into a "compact" open-source student model. Whereas previous methods have focused on one-way knowledge transfer, our method attempts to incorporate teacher-student mutual "feedback" into the learning process. The authors leverage the multifunctional role adaptability of LLM to use different prompts to let the closed-source model recognize "difficult" instructions and generate new "difficult" instructions for the student model, thus creating a three-stage adversarial loop consisting of imitation, discrimination, and generation.

This approach can iteratively and efficiently improve the performance of the student model. Applying this framework, the authors distilled ChatGPT into an open-source student model LLaMA with only 7 billion parameters (the author named the resulting model Lion). Despite being trained on only 70k instruction-following data, Lion demonstrates nearly 95% ChatGPT capability, surpassing previous baselines on both automated and human evaluations on GPT-4. The authors hope that the Lion model can serve as a baseline reflecting the performance of ChatGPT, as well as a baseline for open-source instruction-following models in the NLP community.

Limitations and Discussion

The author pointed out at the end that the Lion model still has the following limitations:

1) The model has limited ability to handle tasks involving complex programming or mathematical calculations;

2) The training data does not include dialogue, so the Lion model is weak in multi-round dialogue;

3) The upper limit of the input sequence length of the model is 4096, and the upper limit of the length of the new output sequence is 1024, so it is impossible to handle ultra-long documents;

4) The safety of the model, the toxicity and preference of the output content are not optimized.

The author also pointed out that a unified and comprehensive evaluation index is necessary for how to measure the ability of large models.

Reference: https://it.sohu.com/a/680520547_121119001

Guess you like

Origin blog.csdn.net/linjie_830914/article/details/131543741