Renmin University of China publishes the largest and most comprehensive review of large models so far

From: RUC AI Box

Enter the NLP group —> join the NLP exchange group

At the end of March this year, we published the first version V1 of the large language model review article "A Survey of Large Language Models" on the arXiv website. This review article systematically sorted out the research progress and core technologies of large language models, and discussed A lot of related work. Since the preprint of the review of large language models was launched, it has received extensive attention and received valuable comments from many readers.

55d4eb885a2a2f5231bcfacedc111535.png

Within 3 months after the release of the V1 version, in order to improve the quality of the review, we have continued to update the relevant content and continuously revised the content of multiple editions (the version number is currently iterated to V11), and the length of the paper has changed from the V1 version The 51 pages and 416 references have been expanded to the 85 pages and 610 references of the V11 version . The V11 version is a version that we have been planning to carry out major repairs since mid-to-late May. Please see the end of the article for the detailed update log, and it was re-released on the arXiv website at the end of June. Compared with the V1 version, the V11 version of the large language model overview has the following new highlights:

  1. Added an introduction to the LLaMA family consisting of the LLaMA model and its derivative models ;

  2. Added specific experimental analysis, including the instruction fine-tuning data set combination experiment and the comprehensive ability evaluation of some models ;

  3. Added a prompt guide for large language model prompt design and related experiments , summarizing the principles and experience of prompt design;

  4. Added the chapters of parameter efficient adaptation and space efficient adaptation , summarizing the lightweight technologies related to large language models;

  5. Added related work introduction for planning;

  6. Added a lot of contextual content, as well as a large number of latest work introductions;

In addition, the Chinese translation version of our review is also continuously updated (translated for the v1 version and continuously updated)

84b3f61c020bc3c6fde8668f534a7dc1.png
  • Paper link: https://arxiv.org/abs/2303.18223

  • GitHub project link: https://github.com/RUCAIBox/LLMSurvey

  • Chinese translation link: https://github.com/RUCAIBox/LLMSurvey/blob/main/assets/LLM_Survey__Chinese_V1.pdf

1 Introduction

Large language models have become a hotspot in academic research. We counted the trend chart of the number of papers containing the keyword "language model" since June 2018 and the number of papers containing the keyword "big language model" since October 2019 in the arXiv paper database . The results show that after the release of ChatGPT, the number of related papers has shown an explosive growth, which fully proves that the influence of large language models in academia has become increasingly prominent, attracting more and more researchers to invest in this field.

dbe7e48ba497c4f71e6f1dbed80015df.png

2. Overview

Compared with the small model, the large model expands the model size, training data size and total calculation, which significantly improves the ability of the language model. In the overview chapter, we have added a discussion of scaling law , which focuses on the KM scaling law and Chinchilla scaling law, which provide important references for understanding the performance improvement of large language models.

  • KM expansion law

d8c5bf17a37a9543a73c914be4027fe5.png
  • Chinchilla expansion law

417239507ed325b875ad0aab8e4ad24f.png

In addition, we have added a new introduction to the technical evolution stages of the OpenAI GPT series of language models (with attached drawings). This part will help readers understand how the GPT series models start from the original GPT and gradually evolve into more advanced large language models such as ChatGPT and GPT-4.

c9581361731e717d07968f07aaf4989b.png

For the core "predicting the next word" of the GPT series, some interview records of Ilya Sutskever have been further added:

11204f69fb40c31a8e7b3bc3099c6e09.png

3. Large language model related resources

We have supplemented the latest eligible models and continuously updated the existing 10B+ model diagrams:

80814e276835c97ba3bda50319ff9e4a.png

In February 2023, Meta released the LLaMA large language model. Benefiting from its powerful base capabilities, the release of LLaMA has aroused an upsurge in the open source community to expand it. A large number of researchers have fine-tuned instructions or continued pre-training based on LLaMA, thus spawning a large number of high-quality open source large language models. In order to help readers understand the development of the LLaMA family model, we have added an introduction to the development of the LLaMA family model , and drawn a brief LLaMA family evolution diagram to show the development process of the LLaMA family model and the relationship between various derived models.

cf3a16c8aec58f99f1a61150a4ae10bd.png

4. Large language model pre-training technology

In the chapter on pre-training techniques, we have greatly supplemented the technical details of various aspects of large-scale model pre-training. In the model architecture section, we supplemented a comparison chart of three mainstream model architectures , including causal encoder, prefix decoder, and encoder-decoder architecture , so as to intuitively show the differences and connections of these three architectures.

dcb2b78e9df298852f12a5224c7a04bf.png

In addition, we supplemented the details of each component of the model architecture, including word segmentation, normalization method, normalization position, position encoding, attention and bias, etc., and provided a detailed formula table for multiple configurations of the Transformer architecture . In the final Discussion section, we discuss the long-text encoding and generation challenges that are widely concerned by researchers.

263fad567857824717e36d9ee1cad5c1.png

For tokenization of pre-trained data, we introduce three commonly used algorithms: BPE, WordPiece and Unigram:

f97397571fde2e9c2e256fb734c9c1cf.png

5. Large language model adaptation technology

In the adaptation technology chapter, we expand the technical details of instruction fine-tuning , including instruction collection method, function of instruction fine-tuning, results of instruction fine-tuning and corresponding analysis. First of all, we introduce the collection method of instruction data according to the three types of task instruction, chat instruction and synthetic instruction , and collect the instruction set .

4ab788835990e1abb8df9ae8f814f7df.png

And update the schematic diagram of the creation method of the instruction set:

cf95506d74c06278e4be5cfcf179e59d.png

Secondly, in order to explore the impact of different instruction data on model performance, we give the experimental results of the instruction fine-tuning model under different data mixing strategies for readers' reference. In order to let readers get started with instruction fine-tuning better, a resource reference table of instruction fine-tuning large models is also given , and practical suggestions for instruction fine-tuning are given .

e2a46df2645a72e6a3e453387bb3251a.png

With the increasing attention of large language models, how to fine-tune and use large language models more lightly has also become a hot topic in the industry. To this end, we have added the chapters of parameter efficient adaptation and space efficient adaptation . In the chapter on efficient parameter adaptation, we introduce common efficient parameter adaptation technologies, including Adapter Tuning, Prefix Tuning, Prompt Tuning, LoRA, etc., and list the recent specific practices of combining these technologies on large models.

678b98f00e96fc595402c3eba7415d8f.png

At the same time, due to the huge amount of parameters of large language models, a large amount of memory (video memory) is required during inference, resulting in high deployment costs in practical applications. To this end, we introduce space-efficient adaptation techniques and discuss how the memory footprint of large language models can be reduced through model compression methods (model quantization), allowing them to be used with limited resources. Some core conclusions of our discussion are summarized below:

1517e2eaf59524d62b32049ab085e972.png

6. Large language model using technology

We divide the mechanistic analysis of how large language models perform contextual learning at the inference stage into two categories, task recognition and task learning . In the task recognition part, it introduces how the large language model recognizes tasks from examples and uses the knowledge acquired in the pre-training stage to solve them; in the task learning part, it introduces how the large language model learns to solve new tasks from examples.

Besides contextual learning and thought-chain hints, we also introduce another important paradigm using large language models, namely, hint-based planning for complex tasks . Following related work, we summarize a general framework for planning-based hints . Such paradigms typically contain three components: mission planner, planner, and environment. Subsequently, we introduce the basic practices of this paradigm from three aspects of plan generation, feedback acquisition and plan refinement .

5af2b33046b6fe1637ce7ddb0977cf2f.png

7. Evaluation of large language model capabilities

Considering the growing capability of conditional language generation of large language models, we introduce the discussion of existing work on the reliability of automatic evaluation of language generation in the era of large language models . For the advanced capabilities of large language models, we have added the latest related work, and summarized the commonly used data sets for the evaluation of advanced capabilities of large language models for readers' reference. In addition, with the improvement of the general ability of large language models, a series of works have proposed more challenging comprehensive evaluation benchmarks based on human testing to evaluate large language models, and we have increased the introduction of these representative evaluation benchmarks.

76ddb5036d64b00ceca27a1f0ec4106d.png

In the era of large language models, open-source and closed-source large language models continue to emerge. We have conducted fine-grained capability evaluations on some popular open-source models and closed-source models , covering 8 basic and 27 corresponding to advanced capabilities summarized in the evaluation chapter. a representative task . Further, we conduct a detailed analysis of the evaluation results of the open-source model and the closed-source model.

In order to better explain the existing problems of the large model, we have summarized the key issues in the form of notes:

b951cd1edf96b40f1733d6237c66171c.png

8. Large Language Model Hint Design Guidelines

In the era of large language models, hints have become an important form of human-machine interaction. How to write good prompts, however, is a craft that requires skill and experience. In order to allow readers to quickly get started with hint design for large language models, we give a practical hint design guide . We detail the key components of hints and discuss some key hint design principles .

A complete prompt usually contains four key components, namely task description, input data, context information and prompt style. In order to better demonstrate these constituent factors, we give an intuitive sample table of hints .

e4e8dfabb3e9a25d564e8a4c43387074.png

Added diagrams for related hints:

0e0118aea6ccf9cdc3259b9572a4666b.png

In addition, we also summarize some key prompt design principles, including clearly expressing task goals, decomposing complex tasks, and using a model-friendly format . Further we build on these design principles and present a series of useful prompt design tips .

Finally, we combined a variety of common tasks, and based on ChatGPT, we specifically experimented with the impact of different prompts on model performance , for readers to refer to when using prompts to perform specific tasks.

63e48f5b9bfa11353ec1e5de0aac80b9.png

9. Field application of large language models

With the increasing attention to large language models, researchers and industry practitioners are also trying to apply large language models to various professional fields. In order to systematically introduce these application practices, we separate the domain application part of the large language model in the review into a dedicated chapter. Specifically, we expand the introduction of the original research on applying large language models to the fields of medical care, education, and law , and add new introductions to related work in the fields of finance and scientific research .

10. Seeking Advice and Computing Power

A high-quality long-form review article requires a significant time investment, and the teachers and students involved have contributed a lot of time to it. Although we have tried our best to perfect this review article, due to limited capacity, there are inevitably deficiencies and mistakes, and there is still a lot of room for improvement.

Our ultimate goal is to make this overview article a " know-how " technical guidebook for large models, so that the secrets of large models are no longer mysterious and the technical details are no longer hidden. Although we are well aware that this review is still far from this goal, we are willing to do our best to improve it in future editions. In particular, we welcome readers to contribute ideas and suggestions for pre-training, instruction fine-tuning, inner principles of hint engineering, and practical experience. You can submit a PR through GitHub or contact our authors by email. For all the adopted technical details, we will express our gratitude in the "real name + actual contribution" in the acknowledgment section of the paper.

At the same time, we are also carrying out relevant experimental explorations (such as ability evaluation, instruction fine-tuning, etc.) around some of the contents of the large-scale model review, so as to ensure that the discussions in the review can be based on evidence. Due to the limitation of computing power, the experiments that can be carried out are limited to small-scale models and a small number of comparison methods. Here, we also seek computing power support from the society, and we promise that the computing power resources obtained will be fully used for the compilation of this review article, and all the technical experience gained by using external computing power will be fully published in the review article . We will acknowledge hashrate providers in the acknowledgments section of the review and on the GitHub project homepage. For the support of computing power resources for this review article, please contact us at [email protected] .

Since the publication of our review article, we have received a large number of revision comments from a wide range of netizens, and we would like to express our gratitude here. I also hope that you will continue to support and pay attention to our large-scale model review articles. Your praise and feedback will be our greatest motivation to move forward.

11. List of participating students for this revision

Student author: Zhou Kun (added the task setting and result analysis of the instruction fine-tuning experiment, specifically arranged the experiment details, added the experiment setting and result analysis of the ability evaluation experiment, assisted in organizing the code, added the experiment setting and results of the prompt guide Analysis, added Table 13), Li Junyi (added instruction fine-tuning experiment data set, improvement strategy and experimental settings and experimental table 8, added ability evaluation experiment model, task and data set, and experimental table 11, added tips The design principles of the guidelines and Table 12 and Table 14), Tang Tianyi (add the text details of Chapter 5, add Figures 1, 3, 10, Tables 6 and 7), Wang Xiaolei (add the text details of Chapter 6 6.1, and add 6.3) , Hou Yupeng (add text details in Chapter 4), Min Yingqian (add chapter 3 minority model, LLaMA related discussion, Figure 4), Zhang Beichen (add text details in Chapter 7 and Chapter 9, add Table 10), Dong Zican (Add text details in Table 7, Chapter 4 and Chapter 4), Chen Yushuo (Experiment in Table 7), Chen Zhipeng (Add text details in Chapter 7 and Chapter 9, experiment in Table 11), Jiang Jinhao (Update Figure 8)

Student volunteers: Cheng Xiaoxue (Experiment in Table 11), Wang Yuhao (Experiment in Table 11), Zheng Bowen (Experiment in Table 11), Hu Yiwen (Chinese proofreading), Hou Xinming (Chinese proofreading), Yin Yanbin (Chinese proofreading), Cao Zhanshuo (Chinese Proofreading)

Attachment: Update Log

Version time Major updates
V1 March 31, 2023 initial version
v2 April 9, 2023 Added institutional information. Revised Exhibit 1 and Table 1 and clarified the corresponding selection criteria for large language models. Improved writing. Fixed some minor bugs.
V3 April 11, 2023 Fixed a bug about library resources
V4 April 12, 2023 Revised Figure 1 and Table 1 and clarified release dates for some large language models
V5 April 16, 2023 Added a chapter on the technical development of the GPT family of models
V6 April 24, 2023 Some new models have been added in Table 1 and Figure 1. Added discussion on the law of expansion. Added some explanation for the model size of emergent capabilities (Section 2.1). Added illustrations of attention patterns for different architectures in Figure 4. Added detailed formulas in Table 4.
v7 April 25, 2023 Fixed some copy errors in charts and tables
V8 April 27, 2023 Added parameter efficient adaptation chapter to Section 5.3
V9 April 28, 2023 Revised Section 5.3
V10 May 7, 2023 Revised Form 1, Form 2 and some details
V11 June 29, 2023 Chapter 1: Added Figure 1, the trend graph of large language papers published on arXiv; Chapter 2: Added Figure 3 to show the evolution of GPT and the corresponding discussion; Chapter 3: Added Figure 4 to show the LLaMA family and its corresponding Chapter 5: Added an updated discussion on ways of order-tuning synthetic data in Section 5.1.1, added an empirical analysis on order-tuning in Section 5.1.4, added a discussion on efficient parameter adaptation in Section 5.3 , add a discussion on space-efficient adaptation in Section 5.4; Chapter 6: add an updated discussion on the underlying mechanisms of ICL in Section 6.1.3, and add a discussion on complex task-solving planning in Section 6.3; Chapter 7 : Add Table 10 of the representative data set used to evaluate the advanced ability of LLM in Section 7.2, and add the large language model comprehensive ability pint test in Section 7.3.2; Chapter 8: Add hint design; Chapter 9: Add about A discussion of the application of large language models in the fields of finance and scientific research.

Enter the NLP group —> join the NLP exchange group

Guess you like

Origin blog.csdn.net/qq_27590277/article/details/131588064