Large Model (LLM) Summary

Large models (Large Language Models, LLMs) are one of the most important directions in current AI and NLP research and industry.

This article will summarize the current mainstream large-scale models. (*Updated on 2023.03.19)

In this paper, the model with a parameter size above 1B is regarded as a large model.

Model list

Model author Size type Open source?
LLaMa Meta AI 7B-65B Decoder open
OPT Meta AI 125M-175B Decoder open
T5 Google 220M-11B Encoder-Decoder open
mT5 Google 235M-13B Encoder-Decoder open
UL2 Google 20B Encoder-Decoder open
PaLM Google 540B Decoder no
LaMDA Google 2B-137B Decoder no
FLAN-T5 Google Same as T5 Encoder-Decoder open
FLAN-UL2 Google Same as U2 Encoder-Decoder open
FLAN-PaLM Google Same as PaLM Decoder no
FLAN Google 同LaMDA Decoder no
BLOOM BigScience 176B Decoder open
T0 BigScience 3B Decoder open
BLOOMZ BigScience Same BLOOM Decoder open
mT0 BigScience Same as T0 Decoder open
GPT-Neo EleutherAI 125M-2.7B Decoder open
GPT-NeoX EleutherAI 20B Decoder open
GPT3 OpenAI 175B (davinci) Decoder no
GPT4 OpenAI unknown OpenAI no
InstructGPT OpenAI 1.3B Decoder no
Alpaca Stanford 同LlaMa Decoder open

Meta/Facebook AI

  • LLaMA: Open and Efficient Foundation Language Models

https://arxiv.org/pdf/2302.13971v1.pdf​arxiv.org/pdf/2302.13971v1.pdf

https://github.com/facebookresearch/llama​github.com/facebookresearch/llama

  • OPT: Open Pre-trained Transformer Language Models

https://arxiv.org/pdf/2205.01068.pdf​arxiv.org/pdf/2205.01068.pdf

GitHub - facebookresearch/metaseq: Repo for external large-scale work​github.com/facebookresearch/metaseqUploading...ReuploadCancel

Google

  • T5: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

https://arxiv.org/pdf/1910.10683.pdf​arxiv.org/pdf/1910.10683.pdf

https://github.com/google-research/text-to-text-transfer-transformer​github.com/google-research/text-to-text-transfer-transformer

Note: The code and model of T5 are also open source on the hugging face platform.

google (Google AI) Huggingface.co/google?sort_models=likes#modelsUploading...ReuploadCancel

  • mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer

https://arxiv.org/pdf/2010.11934.pdf​arxiv.org/pdf/2010.11934.pdf

https://huggingface.co/models?search=mt5​huggingface.co/models?search=mt5

  • UL2 and Flan-UL2: Unifying Language Learning Paradigms

https://arxiv.org/pdf/2205.05131.pdf​arxiv.org/pdf/2205.05131.pdf

blog:

https://www.yitay.net/blog/flan-ul2-20b​www.yitay.net/blog/flan-ul2-20b

model:

google/ul2 · Hugging Face​huggingface.co/google/ul2Uploading...ReuploadCancel

google/flan-ul2 Hugging Face​huggingface.co/google/flan-ul2Uploading...ReuploadCancel

  • PaLM: Scaling Language Modeling with Pathways

https://arxiv.org/pdf/2204.02311.pdf​arxiv.org/pdf/2204.02311.pdf

  • LaMDA: Language Models for Dialog Applications

https://arxiv.org/pdf/2201.08239.pdf​arxiv.org/pdf/2201.08239.pdf

blog:

https://blog.google/technology/ai/lamda/​blog.google/technology/ai/lamda/

  • Flan-T5 and Flan-PaLM: Scaling Instruction-Finetuned Language Models

https://arxiv.org/pdf/2210.11416.pdf​arxiv.org/pdf/2210.11416.pdf

google/flan-t5-large Hugging Face​huggingface.co/google/flan-t5-largeUploading...ReuploadCancel

  • Flan: FINETUNED LANGUAGE MODELS ARE ZERO-SHOT LEARNERS

https://arxiv.org/pdf/2109.01652.pdf​arxiv.org/pdf/2109.01652.pdf

**Note: In Google's naming system, the prefix Flan basically means that the model has passed the instruction-tuning.

BigScience (non-profit interest organization)

  • BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

https://arxiv.org/pdf/2211.05100.pdf​arxiv.org/pdf/2211.05100.pdf

bigscience/bloom · Hugging Face​huggingface.co/bigscience/bloomUploading...ReuploadCancel

  • T0: MULTITASK PROMPTED TRAINING ENABLES ZERO-SHOT TASK GENERALIZATION

https://arxiv.org/pdf/2110.08207.pdf​arxiv.org/pdf/2110.08207.pdf

https://huggingface.co/bigscience/T0​huggingface.co/bigscience/T0

  • BLOOMZ and mT0: Multilingual version of BLOOM and T0

https://arxiv.org/pdf/2211.01786.pdf​arxiv.org/pdf/2211.01786.pdf

EleutherAI

  • GPT-NEO

https://github.com/EleutherAI/gpt-neo​github.com/EleutherAI/gpt-neo

  • GPT-NeoX

https://arxiv.org/pdf/2204.06745.pdf​arxiv.org/pdf/2204.06745.pdf

https://huggingface.co/EleutherAI/gpt-neox-20b​huggingface.co/EleutherAI/gpt-neox-20b

OpenAI

OpenAI's large models have not been open source since GPT3. For the API of OpenAI's GPT series models, see:

No. 9: OpenAI API Detailed Explanation of All GPT Models 47 Agreed · 0 Comments

Stanford

Alpaca, LLaMA's instruction fine-tuning model, the effect reaches the GPT-3.5 level.

https://github.com/tatsu-lab/stanford_alpaca​github.com/tatsu-lab/stanford_alpaca

Latest: Prompt/Instruct Tuning open source data summary

No. 9: Summarize the Instruct/Prompt Tuning data available in open source 440 agree · 4 comment articles

**If there are large models not mentioned in this article, readers are welcome to leave a message in the comment area.

Guess you like

Origin blog.csdn.net/bruce__ray/article/details/131123673