National People's Congress: Overview of the latest upgrades of large models in September

From: RUC AI Box

Enter the NLP group—> Join the NLP exchange group

At the end of March this year, we released the first version V1 of the large language model review article "A Survey of Large Language Models" on the arXiv website. This review article systematically sorted out the research progress and core technologies of large language models, and discussed A lot of related work. Since the preprint of the large language model review went online, it has received widespread attention and received valuable comments from many readers.

99bec55bdf765a3969351f56a98ee465.png

In the five months since the release of the V1 version, in order to improve the quality of this review, we have continued to update relevant content and revised the content in multiple editions (the version number is currently iterated to V12). The length of the paper has increased from that of the V1 version . From 51 pages and 416 references to 85 pages and 610 references in the V11 version, it has now been further expanded to 97 pages and 683 references in the V12 version. Following the overhaul version V11 released on the arXiv website at the end of June, version V12 is another version we have overhauled in more than two months.

Compared with the V11 version, the large language model review of the V12 version has the following new highlights:

  1. Added a brief introduction to emerging architectures, attention methods , and decoding strategies ;

  2. Added relevant introduction to practical tips for fine-tuning instructions ;

  3. Added an overview of RLHF and non-RL alignment methods ;

  4. The specific experimental analysis has been improved, and the latest models have been added to the instruction fine-tuning and capability evaluation experiments ;

  5. Added a new discussion on evaluation methods and summarized existing evaluation work ;

  6. A lot of context-clarifying content has been added, as well as a lot of latest work introductions;

In addition, the Chinese translation version of our review is also continuously updated (it is currently translated for version v10 and will continue to be updated):55ed26e763e511c1d781a26d66ef838b.png

  • Paper link: https://arxiv.org/abs/2303.18223

  • GitHub project link: https://github.com/RUCAIBox/LLMSurvey

  • Chinese translation version link: https://github.com/RUCAIBox/LLMSurvey/blob/main/assets/LLM_Survey_Chinese.pdf

The following is an introduction to the main updated content of some chapters of the review. For details, please refer to our English review.

1. Large language model related resources

We have supplemented the latest eligible models and continued to update the existing 10B+ model diagrams and tables (if there are any omissions, readers are welcome to write in to supplement them):

053af520a287b020e718634f8e60315c.png

f158e150790c46bb64a993c916168bfd.png

2. Large language model pre-training technology

In the model architecture part, since the attention mechanism of the classic Transformer architecture requires square-level time complexity for calculation, there have been a series of explorations of new language modeling architectures recently, such as S4, RWKV, RetNet, etc. It is hoped that both Transformer and The advantage of parallel training on GPU is that decoding and inference can be performed with low complexity and high efficiency. In addition, there is also some work dedicated to improving the attention mechanism or calculation method of the traditional Transformer architecture to make it efficient training and deployment. We have added an introduction to several new attention mechanisms, including grouped-query attention, FlashAttention-2, and PagedAttention. We have briefly introduced these contents.

In addition, we have added a new decoding strategy sub-chapter, introducing two common decoding strategies: greedy search and random sampling, and sorting out improved algorithms for these two strategies, such as beam search, top-p sampling, top-k sampling strategies. In addition, we introduce efficient decoding strategies for large models, as well as common settings when decoding specific models and APIs.

3. Large language model adaptation technology

In the adaptation technology chapter, we have added a lot of discussion and experimental analysis.

In the command fine-tuning section, we have added practical tips for command fine-tuning. In the instruction fine-tuning experiment part, we added the instruction fine-tuning experiment of the LLaMA-13B model to analyze different mixed data sets.

ef79e6f38f9607293a076bc056a6fb12.png

In the alignment fine-tuning section, in order to help researchers implement RLHF quickly and effectively, we provide an introduction to practical strategies for RLHF, including how to effectively train reward models and how to efficiently perform reinforcement learning training . We hope to provide follow-up researchers with Constructive reference. Furthermore, we significantly increase our coverage of existing non-RL alignment methods . Different from the RLHF method that uses manual methods to collect feedback data, this part of the work mainly uses the reward model and the large model to automatically collect feedback data for alignment, and uses a variety of supervised training methods to fine-tune the large model.

Finally, we also discussed two training methods, SFT and RLHF.

4. Large language model usage technology

After pre-training or adaptation, one main way to use LLM is to design suitable cueing strategies to solve various tasks. We have added Table 9 to summarize the representative work of existing prompts , including typical LLM application methods and the focus of ICL, CoT and planning .

d60aa155b73a4d1b13b8d78ccb97ebf9.png

In addition, in order to handle long-term tasks, using long-term memory to help planning is an important method. We have added new work currently using memory mechanisms for planning, including Reflexion and MemoryBank.

3a741fffc1282a734a60cf6eb7f7975a.png

5. Assessment of large language model capabilities

In terms of large model capability evaluation, we have added a sub-section to discuss the evaluation methods , and introduce the related evaluation work of the base model, fine-tuned model and professional model respectively. We summarized the existing evaluation work and discussed the advantages and disadvantages of three types of evaluation methods: benchmark evaluation, human evaluation and model evaluation . We summarize existing evaluation work in the table.

b15bf0a7d76401b507f92b8ec7aa9212.png

       In addition, with the release of new large language models, we have added the evaluation results of several popular large language models in the empirical evaluation chapter, including LLaMA 2 (Chat) 7B, Claude-2 and Vicuna 13B, and have added additional evaluation results for the new large language models. Experimental discussion of the model.

48deda9cbdb514248dc59548e871d13f.png

6. Overview and positioning

A high-quality long review article requires a lot of time investment, and the teachers and students involved have devoted a lot of time to it. Although we have tried our best to improve this review article, due to limited capabilities, there are inevitably deficiencies and errors, and there is still a lot of room for improvement. Our ultimate goal is to make this review article a " know-how " technical guide to large models, so that the secrets of large models are no longer mysterious and the technical details are no longer hidden . Although we are well aware that the current review is still far from this goal, we are willing to do our best to improve it in subsequent versions. In particular, we welcome readers to contribute ideas and suggestions to us regarding pre-training, instruction fine-tuning, the inner principles of prompt engineering, and practical experience. You can submit a PR through GitHub or contact our authors by email. For all adopted technical details, we will acknowledge "real names + actual contributions" in the acknowledgments section of the paper.

Since its publication, our review article has received a large number of revision comments from a wide range of netizens, for which we would like to express our gratitude. We also hope that everyone will continue to support and pay attention to our large model review article. Your likes and feedback will be our biggest motivation to move forward.

7. List of participating students in this revision

Student author: Zhou Kun (added task setting and result analysis of instruction fine-tuning experiment, added experimental setting and result analysis of ability evaluation experiment, added introduction of practical skills for instruction fine-tuning, and added introduction of practical strategies of RLHF), Li Junyi (Added an introduction to non-RL alignment methods), Tang Tianyi (added an introduction to decoding strategies), Wang Xiaolei (added an introduction to evaluation methods), Hou Yupeng (added text details in Chapter 4, updated Figure 5), Min Yingqian (added Chapter 4) Three chapters of minority models and related introductions, updated Table 1, Figure 2), Zhang Beichen (added Table 10), Chen Yushuo (Table 8 experiment), Chen Zhipeng (Table 12 experiment), Jiang Jinhao (Table 12 experiment), Ren Ruiyang (Table 12 experiment) ), Tang Xinyu (Table 12 Experiment)

Student volunteers: Cheng Xiaoxue (Table 12 Experiment), Wang Yuhao (Table 12 Experiment), Zheng Bowen (Table 12 Experiment)

Attachment: Update log

Version time Main updates
V1 March 31, 2023 initial version
V2 April 9, 2023 Added institutional information. Revised Figure 1 and Table 1 and clarified the corresponding selection criteria for large language models. Improved writing. Corrected some minor errors.
V3 April 11, 2023 Fixed bug regarding library resources
V4 April 12, 2023 Revised Figure 1 and Table 1 and clarified release dates for some large language models
V5 April 16, 2023 Added a chapter on the technical development of the GPT series models
V6 April 24, 2023 Some new models have been added to Table 1 and Figure 1. Added discussion of expansion laws. Added some explanation for model dimensions of emergent capabilities (Section 2.1). Added illustrations of attention patterns for different architectures in Figure 4. Detailed formulas added in Table 4.
V7 April 25, 2023 Fixed some copy errors in charts and tables
V8 April 27, 2023 Added parameter efficient adaptation chapter in Section 5.3
V9 April 28, 2023 Revised section 5.3
V10 May 7, 2023 Revised Form 1, Form 2 and some details
V11 June 29, 2023 Chapter 1: Added Figure 1, a trend chart of large language papers published on arXiv;
Chapter 2: Added Figure 3 to show the evolution of GPT and corresponding discussions;
Chapter 3: Added Figure 4 to show the LLaMA family and its corresponding Discussion;
Chapter 5: Add the latest discussion about the method of command adjustment to synthesize data in Section 5.1.1, add the empirical analysis about command adjustment in Section 5.1.4, and add the discussion about efficient parameter adaptation in Section 5.3 , add discussion about space-efficient adaptation in Section 5.4;
Chapter 6: Add the latest discussion about the underlying mechanism of ICL in Section 6.1.3, add discussion about complex task solution planning in Section 6.3; Chapter
7 : Add Table 10 of representative data sets used to evaluate LLM's advanced capabilities in Section 7.2, add large language model comprehensive capability evaluation in Section 7.3.2; Chapter 8: Add
prompt design;
Chapter 9: Add information about large language models Discussion of the application of language models in finance and scientific research.

V12

September 11, 2023
Chapter 3: New models in Table 1 and new models in Figure 2; Chapter 4: New discussion of new architectures in Section 4.2.1, and introduction to several attention mechanisms in Section 4.2.2 , add an introduction to the decoding strategy
in Section 4.2.4; Chapter 5: Add practical skills for instruction fine-tuning in Section 5.1.2, add LLaMA-13B instruction fine-tuning experimental analysis in Section 5.1.4 and Table 8, Added practical strategies for RLHF in Section 5.2.3, introduced alignment methods without RLHF in Section 5.2.4, and discussed discussion of SFT and RLHF in Section 5.2.5; Chapter 6: Added Table
9 Summary For the representative work of prompts, the introduction of memory in the planning part has been updated in Section 6.3;
Chapter 7: Added a discussion of evaluation methods in Section 7.3.2, and added Table 11 to summarize the existing evaluation work, updated Section 7.4 Experience ability evaluation and evaluation results in Table 12.

Enter the NLP group—> Join the NLP exchange group

Guess you like

Origin blog.csdn.net/qq_27590277/article/details/133004181