Evolution History of Open Source Language Large Models: Keeping pace with LLaMA 2

ae9474c2c0706839f74cfcf074d45846.jpeg

This article is the third part of a series on the history of open source LLM development. Previously, the first part, "The Evolution of Open Source Language Large Models: Early Innovations", reviewed the initial attempts to create an open source LLM. The second part "The Evolution of Large Models in Open Source Languages: Competition for High-Quality Basic Models" examines the most popular open source basic models currently available (i.e. already A language model that is pretrained but not yet fine-tuned or aligned).

This article will introduce how to improve the performance of open source models such as LLaMA-2 by fine-tuning/aligning those better ones, and narrow the gap between open source and private LLM.

(The author of this article is Cameron R. Wolfe, AI director of Rebuy and Ph.D. in deep learning. The following content is compiled and published by OneFlow with authorization. Please contact the author for reprinting. Original text: https://cameronrwolfe.substack.com/p/the-history- of-open-source-llms-imitation)

作者 | Cameron R. Wolfe

OneFlow compilation

Translation|Yang Ting, Wan Zilin

59f9a9c0492c4ba6f56cf4c0e3ed35a8.png

(Quoted from [1,2,12])

In previous research on open source language large models, most of the focus has been on creating pre-trained base models. However, since these models are not fine-tuned and lack alignment, their quality cannot match that of top closed-source LLMs (such as ChatGPT or Claude). Paid models are often fully aligned using techniques such as SFT (supervised fine-tuning) and RLHF (reinforcement learning with human feedback), which greatly improves the usability of the model. In contrast, open source models are often fine-tuned on a small scale using smaller publicly available datasets. This article will look at recent research to improve the quality of open source LLMs through more extensive fine-tuning and alignment.

f086b7f2b69cdc4a63eeda3175e458d3.png

(Quoted from [17,18])

Alignment process. This article examines the fine-tuning and alignment process of an open source LLM. Before we get to the point, we need to understand what alignment is and how to align it. The training of language models can usually be divided into several parts. As shown in the figure above, first we need to pre-train the model, and then perform multiple steps of fine-tuning. After pre-training, LLM can accurately predict the next word element, but its output content may be repetitive and monotonous. Therefore, the model needs to be fine-tuned to improve its alignment capabilities and allow the model to generate text that is consistent with human user expectations (such as following instructions, avoiding harmful output, preventing misrepresentations, generating interesting or creative output, etc.).

4d1debfcacded1528598fd7f98c3669e.png

(Quoted from [17])

Supervised fine-tuning (SFT). Alignment can be achieved through two fine-tuning techniques: supervised fine-tuning and reinforcement learning from human feedback. Refer to the image above for more details. SFT simply fine-tunes the model, using standard language modeling objectives and training on high-quality prompt and response examples. LLM can learn from these examples how to respond appropriately! SFT is very simple and effective, but requires careful curation of a data set that captures the "correct" behavior.

RLHF. The LLM is trained directly using feedback from human annotators, who identify outputs they like, and the LLM then learns how to generate more similar outputs. To achieve this, we first need a set of prompts and generate multiple different outputs from the LLM on each prompt. Afterwards, let human annotators rate each generated response based on the quality of the output. These scores can be used to train a reward model (i.e., a fine-tuned version of an LLM with an additional regression head) that predicts the score for each response. Next, RLHF fine-tunes the model through the PPO reinforcement learning algorithm to maximize the score. Typically, high-quality LLMs are aligned sequentially via SFT and RLHF (using extensive human feedback).

1

imitation learning

563d9fa8b5926540d3a4de0491c814ea.png

(Quoted from [16])

After the release of LLaMA [3], the open source research community was finally able to use a powerful basic LLM to fine-tune or align a variety of different applications. This caused open source LLM research to experience explosive growth, and practitioners rushed to perform tasks on their own choice. Fine-tune the LLaMA model. Interestingly, one of the most common directions in research during this period was imitation learning. Imitation learning can be seen to some extent as an alignment method that optimizes an LLM by fine-tuning the output of another more powerful LLM. This approach is inspired by knowledge distillation. Please refer to the picture above for specific details.

"The premise of model imitation is that once the proprietary LM is provided through the API, the output data set of the API can be collected and used to fine-tune the open source LLM." – Quoted from [6]

The question posed by open source imitation learning research is simple: Can we create a model as powerful as ChatGPT or GPT-4 by simply fine-tuning the responses generated by them? In this regard, we can use the following simple method to predict:

  •  Collect conversation examples for these models (e.g. using the OpenAI API)

  •  Perform (supervised) fine-tuning on this data (using normal language modeling goals)

There has been a long and heated discussion in the research community about whether imitation learning is a valuable learning method. Ultimately, we found that this approach worked in practice, but only under certain conditions.

2

Initial efforts at imitation learning

363ec1f806c3fdf14d97e57425376172.png

LLaMA has spawned numerous imitation models (cited from [7,8,9,10])

After the release of LLaMA, researchers quickly released various imitation models using ChatGPT-derived conversations. Typically, the data used to train these models (commercial use prohibited) can be obtained from the OpenAI API or something like ShareGPT. Some well-known imitation models are summarized below in chronological order.

Alpaca [7] Automatically collect fine-tuning datasets in GPT-3.5 (i.e. text-davinci-003) through the self-instruct [11] framework, for LLaMA-7B Make fine adjustments. Collecting data and fine-tuning Alpaca only costs $600.

Vicuna [8] Fine-tuned LLaMA-13B using 70,000 conversation examples from ChatGPT (derived from ShareGPT). Interestingly, the entire fine-tuning process for Vicuna only costs $300.

Koala [9] on a large number of conversation examples from Alpaca's fine-tuned dataset and other sources such as ShareGPT, HC3, OIG, Anthropic HH and OpenAI WebGPT/Summarization LLaMA-13B was fine-tuned. Compared to previous imitation models, Koala is fine-tuned on larger datasets and evaluated more comprehensively.

GPT4ALL [16] Fine-tuned LLaMA-7B on 800,000 chat completions from GPT-3.5-turbo. The authors also released training/inference code and quantized model weights, allowing for inference with less computing resources (such as using a laptop).

445649dddb1871690681f13ffd3b0454.png

(Quoted from [8,9])

The influence of imitation. These copycat models were released one after another and claimed to be of comparable quality to top proprietary models such as ChatGPT and GPT-4. For example, Vicuna can achieve a quality equivalent to 92% of the GPT-4 model, and Koala can reach or exceed the ChatGPT model performance in most cases. Please refer to the figure above. The above results show that through imitation learning, we can extract the capabilities of any proprietary model and transform it into a smaller open source LLM. If this were the case, people would be able to easily copy the best proprietary LLMs, causing proprietary LLMs to lose their edge.

"Open source models are fast, customizable, more private, and of higher quality. The open source community only needs $100 and 13B parameters to do what Google spent $10 million and 540B parameters to barely complete. tasks. Furthermore, it would take Google months to achieve these goals, while the open source model can be completed in just weeks." ——Quoted from [9]

The explosion of imitation models has led to the first time that open source models are truly considered as one of the potential alternatives to closed source LLMs, which have dominated the field since the proposal of GPT-3. Although the use of paid APIs is becoming the standard, the excellent performance of imitation models brings hope for open source LLM.

3

Is the imitation model realistic and feasible?

06b54d0b882e167b2d41e977d1639add.png

(Quoted from [6])

Although the excellent performance of imitation models brings hope, [6] shows that we have overlooked some important factors, namely the need for a more targeted evaluation of these models. After evaluation, we found that the performance of imitation models is far inferior to Top proprietary LLMs like ChatGPT and GPT-4. In fact, in most cases, fine-tuning the base model through imitation cannot bridge the huge gap in performance between open source and proprietary models. Instead, the resulting model is only better at tasks presented in large numbers in the fine-tuning set, but is also more prone to hallucinations.

e31f446159aabd0df1d7a487083d5c8d.png

(Quoted from [6])

Experimental setup:To evaluate the effectiveness of imitation learning, the authors in [6] screened approximately 130,000 different conversation examples from ChatGPT and constructed a dataset. They then fine-tuned language models of different sizes using varying amounts of imitation data and measured their effectiveness. Several interesting observations can be made from the above experiment:

  • In human evaluation experiments, the amount of imitation data used for fine-tuning does not improve model quality.

  • Imitation models often perform worse than base models on standardized benchmarks (the quality even decreases when more imitation data is used).

  • Increasing the size of the base model can gradually improve the quality of the imitation model.

What exactly happened? When imitation models are evaluated on a broader range of natural language evaluation benchmarks, we find that the quality of imitation models is comparable to or slightly worse than that of their underlying LLM counterparts. In other words, imitation models don't actually match the quality of models like ChatGPT. The knowledge base of these models is more limited compared to proprietary LLMs, as evidenced by the quality improvement of the larger base models.

“In our view, the best strategy for improving the open source model is to solve the core problem of developing a better underlying LLM rather than imitating proprietary systems.”— —Quoted from [6]

At this point, we first need to consider: why do these models perform so well? In [6] we saw that the imitation model learned to imitate the style of models such as ChatGPT. So even if a model produces factual misinformation more frequently (information that is more difficult to simply check or verify), human workers can still be fooled into thinking the model is outputting quality content.

4

Does imitation learning really work?

“Research shows that learning from step-by-step explanations, whether generated by humans or more advanced AI models, is an effective way to improve model capabilities and skills.”——Quoted from [1]

It was found in [6] that imitation models did not perform as well as initially expected, leaving the research community confused as to the true value of imitation models. However, it is worth noting that the analysis of [6] pointed out that local imitation - that is, learning to imitate the behavior of the model on a specific task, rather than imitating its overall behavior - is quite effective. However, this does not mean that copycat models can roughly match the quality of proprietary models. In order to improve the quality of the imitation model as a whole, the author of [6] proposed two approaches:

  • Generate a larger, more comprehensive imitation dataset

  • Create a better base model for imitation learning

Interestingly, subsequent studies have explored both pathways in depth, showing that both can have positive effects.

8d3ab7be14d77c82e7353cac838e808d.png(Quoted from [12])

Orca [12] is an imitation model based on LLaMA-13B. However, compared to previous imitation learning efforts, Orca is trained on higher quality, more detailed, and more comprehensive datasets collected from ChatGPT and GPT-4. The data set previously used for imitation learning is a kind of "shallow" data, which only contains prompts and response examples generated by models such as ChatGPT. Please refer to the figure above.

"We conclude that if we are to broadly match ChatGPT purely through imitation, we will need to focus on collecting large imitation datasets and collect imitation data that is more diverse and of higher quality than is currently available. ”——Quoted from [6]

To improve shallow imitation, Orca attempts to augment the imitation dataset generated by models such as ChatGPT or GPT-4 by:

  • explain trace

  • step by step thought process

  • Complex instructions

To this end, the private LLM is prompted to provide a detailed explanation of its response via a command or system message. This approach goes beyond simple cue-response pairs by adding additional useful information to the data visible to the imitation model. When learning from powerful models like ChatGPT, Orca doesn't just see the model's responses, it learns from the detailed explanations and thought processes that are co-generated with the model's responses on complex prompts. Here's an example to illustrate.

407b2a5031261a6126a5e584597fc495.png

(Quoted from [12])

After extensive fine-tuning using this detailed imitation dataset (5 million examples from ChatGPT and 1 million examples from GPT-4), Orca performs extremely well compared to previous imitation models. The following figure.

6deb62cc0bc46a9940247ba9ba52db5b.png

While Orca significantly narrows the gap between open source imitation models and private LLMs, we can still see from the table below that the model consistently falls short of GPT-4 in terms of quality. Unfortunately, even with improved imitation methods, open source imitation models cannot fully match the quality of top proprietary models.

43903b953eb213d71c47b03c5fe2c94e.png

However, Orca's excellent performance demonstrates that imitation learning is a valuable fine-tuning strategy that can significantly improve the quality of high-quality base LLMs. Further research found in [12] that there are two important requirements for successfully utilizing imitation learning:

  • A large-scale, comprehensive imitation dataset

  • Each response includes a detailed explanation trace

Better base LLM. Although the authors in [6] argue that it is very difficult to collect a sufficiently large and diverse dataset for imitation learning, the Orca example demonstrates the feasibility of this approach. In addition, subsequent research also explored another avenue in [6]: the creation of more powerful (open source) base models. While open source pre-trained LLMs initially performed poorly, we have recently seen the release of various powerful pre-trained LLMs, such as LLaMA [3], MPT [14, 15], and Falcon [13]. Given that model pre-training is the starting point for subsequent fine-tuning (such as imitation learning, SFT, RLHF, etc.), pre-training a more powerful base model will improve the quality of downstream imitation models! Part 2 of this seriescovers all the best open source pre-trained language models.

5

Align open source LLM

8a1f5f724c1069659998bb0bc96ee1ff.png(Quoted from [5])

Imitation learning attempts to improve the quality of open source base models by training them on the responses (and their interpretation trajectories) of proprietary language large models (LLMs). While this approach has produced promising results in some specific cases, this is (obviously) not how top proprietary models are trained - imitation is just a shortcut to building powerful open source models quickly. If we expect open source LLMs to match the performance of proprietary models, greater investment in alignment is needed.

“These closed-source production-level LLMs have been extensively fine-tuned to make them more consistent with human preferences, thereby greatly improving the ease of use and safety of the model. However, this step may require a large investment Computing resources and manual annotation, and often lack transparency, are difficult to reproduce." ——Quoted from [1]

What are the obstacles to alignment? The idea of ​​aligning open source imitation models seems simple. Since we already have an excellent base model, why not just copy the alignment process of a model like GPT-4? The problem is that the alignment process requires extensive computing and manual annotation resources and relies heavily on proprietary data, which limits transparency and makes reproducing results very difficult. As a result, open source models have lagged behind their proprietary counterparts in alignment research for some time. Next we explore two recent studies - LIMA [2] and LLaMA-2 [1] - that significantly improve the quality of open source LLMs through better alignment.

6

Preparatory work for open source alignment

Before introducing LIMA and LLaMA-2, it is worth noting that the open source research community has not completely shied away from aligning pretrained models. For example, Falco-40B-Instruct [13] performed structural fine-tuning (SFT) on 150 million tokens from Baize (Chatbot). Similarly, a number of fine-tuned variants of MPT-7B [14] and MPT-30B [15] have also been released, including the Chat/Instruct variant, SFT-processed on public datasets, and the StoryWriter variant, on Fine-tuning has been done on data with longer context.

0e7addcec72a92714b5c7f6f16730ad8.png

(Quoted from the open source LLM rankings)

In addition, if we briefly observe the open source LLM rankings (as shown in the figure above), we will find that different models have been fine-tuned through SFT on various data sets, and open source LLM does not completely avoid the alignment process.

However, top proprietary models use both SFT and RLHF, aligned on large-scale, high-quality dialogue and human feedback datasets; in contrast, most open source models only use lower quality and lack diversity. Public data sets for alignment. Therefore, to truly match the quality of proprietary models, open source LLMs need to attempt to replicate their alignment processes.

7

LIMA: Efficient data alignment[2]

“The model’s knowledge and capabilities come almost entirely from learning in the pre-training phase, and the alignment process teaches the model which sub-distribution format to use when interacting with the user.”— —Quoted from [2]

As mentioned above, open source LLM has long been aligned mainly by performing SFT on public datasets. Considering the emphasis on SFT, the authors in [2] conducted an in-depth exploration of the role of SFT in pre-training LLM. The goal is to reveal the relative importance of pre-training and alignment via SFT in creating high-quality LLMs, and to reveal best practices for maximizing model performance after SFT.

Dataset: To achieve this goal, the authors in [2] constructed a small dataset containing 1000 dialogue examples for SFT. Although the amount of data may seem small, the examples in this dataset have been carefully selected to ensure the quality of the dataset through diversity of prompts and uniformity of output style or tone. See picture below for details.

5e4dd80bf6dba0e9fc8f824437696428.png(Quoted from [2])

Although the SFT dataset used to train LIMA is small, its quality is excellent. Interestingly, in [2] we can see that LIMA performs exceptionally well after being fine-tuned on this dataset, even close to the performance of SOTA LLMs such as GPT-4 and Claude, as detailed in the figure below.

a94c30a45ade59b9722b2c57aa6ff8a8.png

(Quoted from [2])

This result shows that language models can be effectively aligned with a small number of carefully selected examples. Although LIMA still underperforms GPT-4, being able to achieve such high-quality alignments with so little data is both unexpected and impressive. This finding tells us that data quality appears to be the most important factorwhen performing SFT.

e71faab2c2885793814f19fef8c9d7c8.png

What can we learn? We learned many valuable lessons from LIMA. First, for SFT, it is not enough to simply increase the amount of data, data quality is also crucial, as detailed in the figure above. Furthermore, in [2], a unique new perspective is proposed to solve the alignment problem regarding the "surface alignment hypothesis".


In short, this hypothesis holds that the core knowledge of most LLMs is learned during the pre-training process, and the key to alignment is to find an appropriate format or style to present this knowledge. Therefore, alignment can be learned in a data-efficient manner.

8

LLaMA-2: Improving transparency in alignment studies [1]

"Llama 2-Chat is the result of months of research and iterative application of alignment techniques, including instruction tuning and RLHF, requiring significant computational and annotation resources." - citation Since[1]

The recently released LLaMA-2[1] suite consists of several open source models with parameter sizes ranging from 7 billion to 70 billion. Compared to its predecessor LLaMA-1, LLaMA-2 has 40% more pre-training data (2 trillion tokens), has longer context length, and uses an architecture optimized for fast inference (grouped query attention [4]). LLaMA-2 became the SOTA of open source models.

However, the LLaMA-2 suite not only contains pre-trained LLMs, but also fine-tunes each model (using both SFT and RLHF) with large-scale conversation data and human feedback, and puts a lot of effort into alignment. The resulting model is called LLaMA-2-Chat.

ee889793db528e12f3855c85a8538f0d.png

(Quoted from [5])

These optimized versions of LLaMA-2 perform well and take an important step toward bridging the alignment gap between open source and proprietary LLMs. LLaMA-2's alignment process emphasizes two key behavioral properties:

1. Usefulness: The model fulfills the user's request and provides the required information.

2. Security: The model avoids replying to “unsafe” content.

To ensure that the aligned models are both useful and safe, the data provided for SFT and RLHF were screened, collected, and annotated according to the above principles.

cbabdf6f6b48fe8412f4c5bff9a6e863.png

(Quoted from [1])

SFT: The first step in the LLaMA-2 alignment process is fine-tuning using SFT. Similar to other open source LLMs, LLaMA-2 is first fine-tuned on publicly available instruction tuning data. However, such data often lack diversity and are of poor quality, as shown in LIMA [2], which can seriously affect model performance. Therefore, the authors in [1] focused on collecting a small set of high-quality data for SFT. This data comes from a variety of sources, including manually created or annotated examples, as well as data obtained from public sources and filtered for quality. Finally, LLaMA-2 was fine-tuned in the second stage by using 27540 high-quality dialogue examples. For specific examples, please refer to the figure above.

Surprisingly, we found that the output sampled by the SFT model was often comparable to human-annotated SFT data, suggesting that we can re-adjust priorities to More annotation work is invested in preference-based annotation of RLHF.” ——Quoted from [1]

Interestingly, the authors in [1] observed that there are diminishing benefits for collecting data for SFT beyond 27K high-quality examples. These findings are consistent with the empirical analysis results of LIMA [2]. We do not need a large amount of data to perform SFT, but the data must be of high quality. At the same time, the authors in [1] also noticed that the LLaMA-2 model after SFT seemed to be able to automatically generate SFT data.

RLHF: LLaMA-2 is further fine-tuned using RLHF from over 1 million human feedback examples. To collect this feedback, a binary protocol was used, requiring human annotators to write a prompt and choose one of two responses generated by the LLM, thereby collecting human preference data based on usefulness and safety criteria. For example, safety-focused human preference annotation might encourage annotators to design an adversarial prompt that might elicit an unsafe response. Human annotators can then annotate which response is preferable and safer (if any).

"All other things being equal, improvements in the reward model can directly translate into improvements in LLaMA 2-Chat." ——Quoted from [1]

Human feedback data is collected on a batch-by-batch basis, while LLaMA-2 is fine-tuned via RLHF between each batch. Therefore, after each RLHF trial, multiple versions of each LLaMA-2-Chat model will be iteratively created, for a total of five versions. In [1] we saw that every time new human preference data is collected, a new reward model is trained for RLHF to ensure that the reward model accurately captures the latest model's human preferences. Furthermore, unexpectedly, the quality of the generated reward model predicts the model quality of LLaMA-2-Chat in general. In total, LLaMA-2 uses more than a million examples of human feedback to fine-tune throughout iterations of RLHF.

ac1d89c4dd47e736fc13bb39ccf4e8d4.png

(Quoted from [1])

As shown above, the quality of LLaMA-2-Chat (considering usefulness and safety) improves smoothly over multiple iterations aligned with SFT and RLHF. This visualization clearly shows how much each technique affects the quality of the resulting model. That said, SFT alone has limited effectiveness. But even when only SFT is applied, each execution of the RLHF stage significantly improves the alignment of the model.

5f84b3a64852319a66369ebff1fcb675.png

The top five models on the Open LLM leaderboard are all based on LLaMA-2 (from the OpenLLM leaderboard)

Quality: As shown in the Open LLM rankings above, the LLaMA-2-Chat model is currently the SOTA in open source LLM. Compared to other popular LLMs in [1], we can see that the LLaMA-2-Chat model is far superior to other open source models in terms of usefulness and security. See picture below for details.

5e8591ac65e77beb2110e04986707b6d.png                                                             (Quoted from [1])

Furthermore, LLaMA-2’s performance in terms of usefulness and security is even comparable to top proprietary models like ChatGPT. In short, the above results strongly demonstrate that the alignment of the LLaMA-2-Chat model is of high quality and that the generated model accurately captures and adheres to the usefulness and safety standards expected by humans.

“[Alignment] may require significant computational and manual annotation costs, and is often not transparent enough to be reproduced, which hinders the community from promoting progress in AI alignment research." ——Quoted from [1].

The importance of LLaMA-2: LLaMA-2 not only sets a new quality benchmark for open source LLM research, but also uses a fundamentally different approach than previous work. By referring to [2], we understand that proprietary LLMs often rely on large amounts of specially annotated data for alignment, and in open source research, this process is more difficult to reproduce. While previous open source models mainly leveraged SFT and public conversation data sources, LLaMA-2 is one of the first models to extensively invest in the alignment process in open source LLM, curating a large number of high-quality conversations and human preferences for use in SFT. and RLHF. This makes LLaMA-2 not only a breakthrough in quality, but also an important methodological contribution to open source LLM research.

9

Conclusion

This series of articles comprehensively explores the development of the entire open source language model from OPT to LLaMA-2. Although there was a lot of research done between the two models, they were proposed only a year apart! The open source AI research community is growing rapidly, and following research in this area is exciting, fun, and rewarding. Powerful models like LLaMA-2-Chat are awe-inspiring. As practitioners and researchers, the opportunity we have to use these models, learn from them, and gain insights into how they work is unique and should be cherished. Especially for LLM, open source research is really cool!

References (please slide up and down) 

[1] Touvron, Hugo, et al. "Llama 2: Open Foundation and Fine-Tuned Chat Models." arXiv preprint arXiv:2307.09288 (2023). 

[2] Zhou, Chunting, et al. "Lima: Less is more for alignment." arXiv preprint arXiv:2305.11206 (2023).

[3] Touvron, Hugo, et al. "Llama: Open and efficient foundation language models." arXiv preprint arXiv:2302.13971 (2023).

[4] Ainslie, Joshua, et al. "GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints." arXiv preprint arXiv:2305.13245 (2023).

[5] “Introducing Llama2: The next generation of our open source large language model”, Meta, https://ai.meta.com/llama/.

[6] Gudibande, Arnav, et al. "The false promise of imitating proprietary llms." arXiv preprint arXiv:2305.15717 (2023).

[7] Taori,  Rohan et al. “Stanford Alpaca: An Instruction-following LLaMA model.” (2023).

[8] Chiang, Wei-Lin et al. “Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90%* ChatGPT Quality.” (2023).

[9] Geng, Xinyang et al. “Koala: A Dialogue Model for Academic Research.” (2023).

[10] Yuvanesh Anand, Zach Nussbaum, Brandon Duderstadt, Benjamin Schmidt, and Andriy Mulyar. GPT4All: Training an assistant-style chatbot with large scale data distillation from GPT-3.5-Turbo, 2023.

[11] Wang, Yizhong, et al. "Self-instruct: Aligning language model with self generated instructions." arXiv preprint arXiv:2212.10560 (2022).

[12] Mukherjee, Subhabrata, et al. "Orca: Progressive Learning from Complex Explanation Traces of GPT-4." arXiv preprint arXiv:2306.02707 (2023).

[13] “Introducing Falcon LLM”, Technology Innovation Institute, https://falconllm.tii.ae/.

[14] “Introducing MPT-7B: A New Standard for Open-Source, Commercially Usable Llms.” MosaicML, www.mosaicml.com/blog/mpt-7b.

[15] “MPT-30B: Raising the Bar for Open-Source Foundation Models.” MosaicML, www.mosaicml.com/blog/mpt-30b.

[16] Gou, Jianping, et al. "Knowledge distillation: A survey." International Journal of Computer Vision 129 (2021): 1789-1819.

[17] Ouyang, Long, et al. "Training language models to follow instructions with human feedback." Advances in Neural Information Processing Systems 35 (2022): 27730-27744.

[18] Glaese, Amelia, et al. "Improving alignment of dialogue agents via targeted human judgements." arXiv preprint arXiv:2209.14375 (2022).

Comment

1. Let’s stop writing here for now! I'm sure I'll write another article to share as I continue my research into open source LLM.

2. This "recipe" - often called the three-step technique - was proposed by InstructGPT (a sister model of ChatGPT) and has been adopted by many powerful LLMs since its proposal!

3. I'm not sure if imitation learning counts as alignment. It is very similar to SFT, where we select conversation examples from existing powerful LLMs (e.g. GPT-4) for SFT. One can think of imitation learning as a kind of general fine-tuning, or even a variant of instructional fine-tuning.

4. This metric is obtained through automated evaluation using GPT-4 as the evaluator.

5. Orca uses the tips collected by FLAN to generate the imitation dataset, a process that takes several weeks due to rate/token limitations of the OpenAI API.

6. Interestingly, the authors in [1] adopted two different RLHF methods, including the typical PPO, RLHF variant and a rejection sampling fine-tuning variant: i) sampling K outputs from the model; ii) Choose the best output; iii) Fine-tune on the example. It is worth noting that both methods are based on reinforcement learning.

7. Just like imitation learning, these public data can even come from other powerful LLMs. For example, we can refer to the conversation data provided by ShareGPT.

Everyone else is watching

试用OneFlow: github.com/Oneflow-Inc/oneflow/icon-default.png?t=N7T8http://github.com/Oneflow-Inc/oneflow/

Guess you like

Origin blog.csdn.net/OneFlow_Official/article/details/134343908