Large models (Large Language Models, LLMs) are one of the most important directions in current AI and NLP research and industry.
This article will summarize the current mainstream large-scale models. (*Updated on 2023.03.19)
In this paper, the model with a parameter size above 1B is regarded as a large model.
Model list
Model | author | Size | type | Open source? |
---|---|---|---|---|
LLaMa | Meta AI | 7B-65B | Decoder | open |
OPT | Meta AI | 125M-175B | Decoder | open |
T5 | 220M-11B | Encoder-Decoder | open | |
mT5 | 235M-13B | Encoder-Decoder | open | |
UL2 | 20B | Encoder-Decoder | open | |
PaLM | 540B | Decoder | no | |
LaMDA | 2B-137B | Decoder | no | |
FLAN-T5 | Same as T5 | Encoder-Decoder | open | |
FLAN-UL2 | Same as U2 | Encoder-Decoder | open | |
FLAN-PaLM | Same as PaLM | Decoder | no | |
FLAN | 同LaMDA | Decoder | no | |
BLOOM | BigScience | 176B | Decoder | open |
T0 | BigScience | 3B | Decoder | open |
BLOOMZ | BigScience | Same BLOOM | Decoder | open |
mT0 | BigScience | Same as T0 | Decoder | open |
GPT-Neo | EleutherAI | 125M-2.7B | Decoder | open |
GPT-NeoX | EleutherAI | 20B | Decoder | open |
GPT3 | OpenAI | 175B (davinci) | Decoder | no |
GPT4 | OpenAI | unknown | OpenAI | no |
InstructGPT | OpenAI | 1.3B | Decoder | no |
Alpaca | Stanford | 同LlaMa | Decoder | open |
Meta/Facebook AI
- LLaMA: Open and Efficient Foundation Language Models
https://arxiv.org/pdf/2302.13971v1.pdfarxiv.org/pdf/2302.13971v1.pdf
https://github.com/facebookresearch/llamagithub.com/facebookresearch/llama
- OPT: Open Pre-trained Transformer Language Models
https://arxiv.org/pdf/2205.01068.pdfarxiv.org/pdf/2205.01068.pdf
- T5: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
https://arxiv.org/pdf/1910.10683.pdfarxiv.org/pdf/1910.10683.pdf
Note: The code and model of T5 are also open source on the hugging face platform.
google (Google AI) Huggingface.co/google?sort_models=likes#modelsUploading...ReuploadCancel
- mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer
https://arxiv.org/pdf/2010.11934.pdfarxiv.org/pdf/2010.11934.pdf
https://huggingface.co/models?search=mt5huggingface.co/models?search=mt5
- UL2 and Flan-UL2: Unifying Language Learning Paradigms
https://arxiv.org/pdf/2205.05131.pdfarxiv.org/pdf/2205.05131.pdf
blog:
https://www.yitay.net/blog/flan-ul2-20bwww.yitay.net/blog/flan-ul2-20b
model:
google/ul2 · Hugging Facehuggingface.co/google/ul2Uploading...ReuploadCancel
google/flan-ul2 Hugging Facehuggingface.co/google/flan-ul2Uploading...ReuploadCancel
- PaLM: Scaling Language Modeling with Pathways
https://arxiv.org/pdf/2204.02311.pdfarxiv.org/pdf/2204.02311.pdf
- LaMDA: Language Models for Dialog Applications
https://arxiv.org/pdf/2201.08239.pdfarxiv.org/pdf/2201.08239.pdf
blog:
https://blog.google/technology/ai/lamda/blog.google/technology/ai/lamda/
- Flan-T5 and Flan-PaLM: Scaling Instruction-Finetuned Language Models
https://arxiv.org/pdf/2210.11416.pdfarxiv.org/pdf/2210.11416.pdf
google/flan-t5-large Hugging Facehuggingface.co/google/flan-t5-largeUploading...ReuploadCancel
- Flan: FINETUNED LANGUAGE MODELS ARE ZERO-SHOT LEARNERS
https://arxiv.org/pdf/2109.01652.pdfarxiv.org/pdf/2109.01652.pdf
**Note: In Google's naming system, the prefix Flan basically means that the model has passed the instruction-tuning.
BigScience (non-profit interest organization)
- BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
https://arxiv.org/pdf/2211.05100.pdfarxiv.org/pdf/2211.05100.pdf
bigscience/bloom · Hugging Facehuggingface.co/bigscience/bloomUploading...ReuploadCancel
- T0: MULTITASK PROMPTED TRAINING ENABLES ZERO-SHOT TASK GENERALIZATION
https://arxiv.org/pdf/2110.08207.pdfarxiv.org/pdf/2110.08207.pdf
https://huggingface.co/bigscience/T0huggingface.co/bigscience/T0
- BLOOMZ and mT0: Multilingual version of BLOOM and T0
https://arxiv.org/pdf/2211.01786.pdfarxiv.org/pdf/2211.01786.pdf
EleutherAI
- GPT-NEO
https://github.com/EleutherAI/gpt-neogithub.com/EleutherAI/gpt-neo
- GPT-NeoX
https://arxiv.org/pdf/2204.06745.pdfarxiv.org/pdf/2204.06745.pdf
https://huggingface.co/EleutherAI/gpt-neox-20bhuggingface.co/EleutherAI/gpt-neox-20b
OpenAI
OpenAI's large models have not been open source since GPT3. For the API of OpenAI's GPT series models, see:
No. 9: OpenAI API Detailed Explanation of All GPT Models 47 Agreed · 0 Comments
Stanford
Alpaca, LLaMA's instruction fine-tuning model, the effect reaches the GPT-3.5 level.
https://github.com/tatsu-lab/stanford_alpacagithub.com/tatsu-lab/stanford_alpaca
Latest: Prompt/Instruct Tuning open source data summary
**If there are large models not mentioned in this article, readers are welcome to leave a message in the comment area.