大模型(LLM)总结 - 代码天地

大模型(LLM)总结

业界资讯 2023-06-12 06:19:11 阅读次数: 0

大模型（大型语言模型，LLMs）是当下AI和NLP研究与产业中最重要的方向之一。

本文将对当下的主流大模型进行总结。（*更新于2023.03.19）

本文将参数规模在1B以上的模型视为大模型。

模型一览

Model	作者	Size	类型	开源？
LLaMa	Meta AI	7B-65B	Decoder	open
OPT	Meta AI	125M-175B	Decoder	open
T5	Google	220M-11B	Encoder-Decoder	open
mT5	Google	235M-13B	Encoder-Decoder	open
UL2	Google	20B	Encoder-Decoder	open
PaLM	Google	540B	Decoder	no
LaMDA	Google	2B-137B	Decoder	no
FLAN-T5	Google	同T5	Encoder-Decoder	open
FLAN-UL2	Google	同U2	Encoder-Decoder	open
FLAN-PaLM	Google	同PaLM	Decoder	no
FLAN	Google	同LaMDA	Decoder	no
BLOOM	BigScience	176B	Decoder	open
T0	BigScience	3B	Decoder	open
BLOOMZ	BigScience	同BLOOM	Decoder	open
mT0	BigScience	同T0	Decoder	open
GPT-Neo	EleutherAI	125M-2.7B	Decoder	open
GPT-NeoX	EleutherAI	20B	Decoder	open
GPT3	OpenAI	175B (davinci)	Decoder	no
GPT4	OpenAI	unknown	OpenAI	no
InstructGPT	OpenAI	1.3B	Decoder	no
Alpaca	Stanford	同LLaMa	Decoder	open

Meta/Facebook AI

LLaMA: Open and Efficient Foundation Language Models

https://arxiv.org/pdf/2302.13971v1.pdfarxiv.org/pdf/2302.13971v1.pdf

https://github.com/facebookresearch/llamagithub.com/facebookresearch/llama

OPT: Open Pre-trained Transformer Language Models

https://arxiv.org/pdf/2205.01068.pdfarxiv.org/pdf/2205.01068.pdf

GitHub - facebookresearch/metaseq: Repo for external large-scale workgithub.com/facebookresearch/metaseq正在上传…重新上传取消

Google

T5: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

https://arxiv.org/pdf/1910.10683.pdfarxiv.org/pdf/1910.10683.pdf

https://github.com/google-research/text-to-text-transfer-transformergithub.com/google-research/text-to-text-transfer-transformer

注：T5的代码和模型同样open source在hugging face平台。

google (Google AI)huggingface.co/google?sort_models=likes#models正在上传…重新上传取消

扫描二维码关注公众号，回复： 15306298 查看本文章

mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer

https://arxiv.org/pdf/2010.11934.pdfarxiv.org/pdf/2010.11934.pdf

https://huggingface.co/models?search=mt5huggingface.co/models?search=mt5

UL2 and Flan-UL2: Unifying Language Learning Paradigms

https://arxiv.org/pdf/2205.05131.pdfarxiv.org/pdf/2205.05131.pdf

blog：

https://www.yitay.net/blog/flan-ul2-20bwww.yitay.net/blog/flan-ul2-20b

model：

google/ul2 · Hugging Facehuggingface.co/google/ul2正在上传…重新上传取消

google/flan-ul2 · Hugging Facehuggingface.co/google/flan-ul2正在上传…重新上传取消

PaLM: Scaling Language Modeling with Pathways

https://arxiv.org/pdf/2204.02311.pdfarxiv.org/pdf/2204.02311.pdf

LaMDA: Language Models for Dialog Applications

https://arxiv.org/pdf/2201.08239.pdfarxiv.org/pdf/2201.08239.pdf

blog:

https://blog.google/technology/ai/lamda/blog.google/technology/ai/lamda/

Flan-T5 and Flan-PaLM: Scaling Instruction-Finetuned Language Models

https://arxiv.org/pdf/2210.11416.pdfarxiv.org/pdf/2210.11416.pdf

google/flan-t5-large · Hugging Facehuggingface.co/google/flan-t5-large正在上传…重新上传取消

Flan: FINETUNED LANGUAGE MODELS ARE ZERO-SHOT LEARNERS

https://arxiv.org/pdf/2109.01652.pdfarxiv.org/pdf/2109.01652.pdf

**注释：在谷歌的命名体系中，前缀Flan基本等于该模型经过了instruct-tuning。

BigScience (非盈利兴趣组织)

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

https://arxiv.org/pdf/2211.05100.pdfarxiv.org/pdf/2211.05100.pdf

bigscience/bloom · Hugging Facehuggingface.co/bigscience/bloom正在上传…重新上传取消

T0: MULTITASK PROMPTED TRAINING ENABLES ZERO-SHOT TASK GENERALIZATION

https://arxiv.org/pdf/2110.08207.pdfarxiv.org/pdf/2110.08207.pdf

https://huggingface.co/bigscience/T0huggingface.co/bigscience/T0

BLOOMZ and mT0: Multilingual version of BLOOM and T0

https://arxiv.org/pdf/2211.01786.pdfarxiv.org/pdf/2211.01786.pdf

EleutherAI

GPT-NEO

https://github.com/EleutherAI/gpt-neogithub.com/EleutherAI/gpt-neo

GPT-NeoX

https://arxiv.org/pdf/2204.06745.pdfarxiv.org/pdf/2204.06745.pdf

https://huggingface.co/EleutherAI/gpt-neox-20bhuggingface.co/EleutherAI/gpt-neox-20b

OpenAI

OpenAI的大模型自GPT3起都没有开源，关于OpenAI GPT 系列模型的API参见：

九号：OpenAI API 所有 GPT Models 详解47 赞同 · 0 评论文章

Stanford

Alpaca，LLaMA的指令微调模型，效果达到GPT-3.5水平。

https://github.com/tatsu-lab/stanford_alpacagithub.com/tatsu-lab/stanford_alpaca

最新：Prompt/Instruct Tuning 开源数据总结

九号：总结开源可用的Instruct/Prompt Tuning数据440 赞同 · 4 评论文章

**如有本文未提到的大模型，欢迎读者评论区留言。

猜你喜欢

转载自blog.csdn.net/bruce__ray/article/details/131123673

大模型(LLM)总结

大语言模型LLM

LLM：大语言模型

大模型LLM-微调经验分享&总结

LLM：大模型的正则化

什么是LLM大语言模型？

大模型LLM论文目录

解读大模型（LLM）的token

LLM模型微调方法总结

001 LLM大模型之Transformer 模型

ChatDoctor（LLM大模型用于医疗对话）

大模型应用开发框架【LLM】

ChatIE（LLM大模型用于信息抽取）

BloombergGPT（LLM大模型用于金融科技）

浅谈MLC LLM(轻设备大模型）

老码农眼中的大模型（LLM）

ChatGPT以及LLM（大语言模型）的思考

我的大模型观：我眼中的LLM

大模型的涌现能力 (Emergent Abilities of LLM)

【LLM】Prompt tuning大模型微调实战

大语言模型（LLM）评估综述

LLM攻击,大语言模型攻击

大语言模型：LLM的概念是个啥？

LLM - 大模型评估指标之 ROUGE

大语言模型LLM知多少？

LLM - 大模型评估指标之 BLEU

基于大模型（LLM）的Agent 应用开发

Elasticsearch：什么是大语言模型（LLM）？

大语言模型LLM中的幻觉

大模型LLM的微调技术：LoRA

今日推荐

Arc Browser for Windows 1.0 正式 GA

90后程序员开发视频搬运软件、不到一年获利超 700 万，结局很刑！

《美国对全球网络空间安全与发展的威胁和破坏》报告发布

火速冲上 GitHub 热榜 —— 开源编程语言、框架哪有这么可爱？

北京人形机器人创新中心发布全球首个纯电驱拟人奔跑的全尺寸人形机器人“天工”

周排行

rbac——界面、权限

Apache CXF + SpringMVC 整合发布WebService

so插件化

Vue.js实战系列---图标字体制作（svg格式）

PAT乙级 1007 素数对猜想(孪生素数对) (20分) ---（C语言 + 详细注释）

被IRM保护的文档，打开失败

Calendar和Date计算日期差的小问题

win10子系统ubuntu18.4安装docker

利用Wrap Shell Script定位Android Native内存泄漏

MySQL: Transaction (Part I - Basic Concept)

每日归档

更多

2024-05-03(19)

2024-05-02(0)

2024-05-01(4)

2024-04-30(1)

2024-04-29(40)

2024-04-28(0)

2024-04-27(56)

2024-04-26(39)

2024-04-25(22)

2024-04-24(36)