Microsoft's Orca learns from GPT-4's complex interpretation traces to significantly enhance smaller models

insert image description here
Large base models (LFMs) such as ChatGPT and GPT-4 have demonstrated impressive zero-shot learning capabilities across a wide range of tasks. Their success can be attributed to the scaling of model and dataset sizes, and the fine-tuning process to keep them consistent with user content.

As these models continue to flourish, an interesting question arises: Can these models supervise their own behavior or other models without much human intervention?

To answer this question, there has been an influx of research using LFMs as teachers to generate datasets to train smaller models. However, the generated student models usually have poorer reasoning and comprehension abilities compared to their teachers.

To address this issue, in a new paper, Orca: Progressive Learning from Complex Explanation Traces of GPT-4, the Microsoft Research team presents Orca, a 13 billion parameter model for learning explanation trajectories; a progressive thought process; complex instructions of GPT-4, which significantly improves the performance of existing state-of-the-art instruction-tuned models.

The team made three key contributions, including interpretation tuning, scaling tasks and instructions, and evaluation, to address current challenges of instruction tuning models in terms of task diversity, query complexity, and data scaling.

insert image description here
In interpretation tuning, researchers query and response pairs from GPT-4 can provide valuable signals for student model learning. Therefore, they augmented the pairs with detailed responses to better explain the teacher's reasoning process in generating the responses.

When expanding tasks and directives, they utilize the Flan 2022 Collection to sample from their task collection for a diverse mix of tasksÿ

Guess you like

Origin blog.csdn.net/iCloudEnd/article/details/131152397