A review of large-scale language models, very detailed, the pattern is open! A Survey of Large Language Models

A review of large-scale language models, very detailed, the pattern is open! A Survey of Large Language Models

Back to Papers and Materials Catalog

Thesis address
Project address

1. Introduction

It is easy to understand and the layout is full! It basically covers the relatively popular AI events since ChatGPT, and also mentioned strong artificial intelligence AGI (artificial general intelligence) many times. The large language models (Large Language Models) in recent years are introduced in detail. Readers who are interested in large models and strong artificial intelligence are highly recommended to read! ! !

2. Abstract and Introduction

Since the Turing test, humans have been exploring ways to use machines to master language intelligence.

Language models have been extensively studied in the past 20 years. From statistical language models to neural network-based language models (LSTM, etc.).

In recent years, by pre-training the Transformer model on a large-scale corpus (data set), a pre-trained language model (PLMs) has been proposed, which has shown strong ability in solving various natural language processing (NLP) tasks. .

In the past one or two years (starting from GPT-3 in 20 years), it was found that when the parameter scale exceeds a certain level, these extended language models not only achieve significant performance improvements, but also show some small-scale language models (such as BERT) Special abilities (such as contextual learning) that do not exist in . To distinguish language models at different parameter scales, the research community has coined the term large language model (LLM) to describe PLMs of significant size (e.g., containing tens or hundreds of billions of parameters).

In the past six months, the launch of ChatGPT (a powerful artificial intelligence chat robot based on LLM) has attracted widespread attention from the society.

In general, the language model LM has gone through the following four stages:

  1. STM (statistical language model): For example, predicting the next word based on a Markov chain.
  2. NLM (neural language model/language model based on neural network): such as RNN, LSTM, etc.
  3. PLM (pre-trained language model): such as GPT-1, GPT-2, Bert, etc. The difference from NLM is that the language model is made into a "once and for all" form, that is, a model can do many things, as long as a model is trained, there is no need for complex fine-tuning of downstream tasks. Among them, GPT-2 made the model in the form of Zero-shot, which greatly enhanced the performance of the pre-trained language model.
  4. LLM (Large Language Model): GPT-3, PALM, ChatGPT, LLaMA, GPT-4, etc. The most intuitive difference from PLM is that the model is larger and there are more training data.

The author here gives 3 situations after the emergence of LLM:

  1. LLM emerges capabilities not present in PLM. LLM is bigger GPT-3 is the first model to scale the model size to 100 billion parameters, which emerges intelligence that does not appear when the model is smaller. So is ChatGPT now.
  2. There are already LM models that people choose to use to solve specific tasks. Now people tell LLM what to do, and then LLM solves the problem according to the requirements.
  3. There is a division between industry and academia. In the past, the development of AI was basically guided by the academic world, but now it is led by the industry. Because LLM requires a large funding base, the earliest breakthroughs came from OpenAI, not universities.

The author mentions the connection of LLM to AGI

OpenAI gives a plan for realizing AGI .

Some recent research "Sparks of Artificial General Intelligence: Early experiments with GPT-4" also believes that GPT-4 already has certain AGI capabilities.

The author gave a review of the literature related to LLM and created a project on github

3. Review

background

The existing LLM is still based on the Transformer structure.

Emerging competencies from LLM

  • comprehension of upper and lower sentences
  • follow instructions
  • step by step reasoning

Key technologies of LLM

  • Scale: Considering the fixed model size and dataset size, how to improve model performance
  • Training: How to Reduce Training Costs
  • Capability extraction: how to extract the capabilities already possessed by the model
  • Tuning: Reducing Harmful Output
  • Tool use: such as using a calculator to help LLM improve its computing power

LLM model development context
insert image description here

Summary of LLM models
insert image description here

In other respects, what the author of the LLM model wrote is not as clear as this official account .

However, the author gave a lot of related paper addresses on github .

data set
insert image description here

Dataset distribution used by the model
insert image description here

Data Processing Process
insert image description here
Model Structure

insert image description here

optimization settings
insert image description here

4. Model tuning

Build instruction count process
insert image description here
instruction data set

insert image description here

RLHF algorithm (method used by InstructGPT)
insert image description here

5. Evaluation

insert image description here

to be continued

Guess you like

Origin blog.csdn.net/a1920993165/article/details/130139346