YUAN 2.0: A Large Language Model with Localized Filtering-based Attention - 代码天地

YUAN 2.0: A Large Language Model with Localized Filtering-based Attention

企业开发 2024-01-08 19:01:09 阅读次数: 0

本文是LLM系列文章，针对《YUAN 2.0: A Large Language Model with Localized Filtering-based
Attention》的翻译。

YUAN 2.0：一个基于本地化过滤的注意力的大型语言模型

摘要
1 引言
2 相关工作
3 方法
4 结果与分析
5 结论

摘要

在这项工作中，我们开发并发布了Yuan2.0，这是一系列参数从21亿到1026亿的大型语言模型。引入了基于局部过滤的注意力（LFA），将自然语言局部依赖性的先验知识引入到注意力中。为了建立高质量的预训练和微调数据集，提出了一种数据过滤和生成系统。提出了一种非均匀流水线并行、数据并行和优化器并行的分布式训练方法，大大降低了节点内通信的带宽要求，在大规模分布式训练中取得了良好的性能。与现有模型相比，Yuan 2.0模型在代码生成、数学问题解决和聊天方面表现出了令人印象深刻的能力。最新版本的YUAN 2.0，包括模型权重和源代码，可以在Github上访问。

1 引言

2 相关工作

3 方法

4 结果与分析

5 结论

在这项工作中，我们介绍了Yuan2.0，一系列具有21亿到1026亿参数的大型语言模型。Yuan 2.0的架构是通过将注意力与本地化过滤相结合来设计的，这比普通注意力带来了更好的准确性。所提出的非均匀流水线并行、数据并行和优化器并行的分布式训练方法大大降低了节点内通信的带宽要求，在大规模分布式训练中具有良好的性能。与现有模型相比，Yuan 2.0模型在代码生成、数学和聊天方面表现出了良好的能力。我们计划在未来的工作中对Yuan 2.0进行逐步改进。

猜你喜欢

转载自blog.csdn.net/c_cpp_csharp/article/details/135404349

YUAN 2.0: A Large Language Model with Localized Filtering-based Attention

NLP、language model、lstm、attention model

A Survey on Model Compression for Large Language Models

Attention-based Model

permutation-based language model

Lion:Adversarial Distillation of Closed-Source Large Language Model

A Survey of Graph Meets Large Language Model: Progress and Future Directions

Paper Reading：《LISA: Reasoning Segmentation via Large Language Model》

《A Decomposable Attention Model for Natural Language Inference》论文总结

浅谈Attention-based Model【源码篇】

Language Model

Attention Model

Efficient Large-Scale Language Model Training on GPU ClustersUsing Megatron-LM

【PaperReading】scBERT as a large-scale pretrained deep language model for cell type annotation of sin

Continual Pre-Training of Large Language Models: How to (re)warm your model?

0基础学AI-Large Language Model Fine-Tune技术

How to Bridge the Gap between Modalities: A Comprehensive Survey on Multi-modal Large Language Model

<A Decomposable Attention Model for Natural Language Inference>（自然语言推理任务）

【论文笔记】attention-based model 论文汇总

浅谈Attention-based Model【原理篇】

【人工智能】大模型LLM技术生态全景图 | The Foundation Large Language Model (LLM) & Tooling Landscape

Challenges and Applications of Large Language Models

A Survey of Large Language Models Attribution

Large Language Models in Finance: A Survey

【笔记】Prompting Large Language Models with Answer Heuristics forKnowledge-based VQA

Exploring the Intersection of Large Language Models and Agent-Based Modeling via Prompt Engineering

《Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering》阅读笔记

language model and RNN

RNN Language Model 详解

【深度学习】：Language Model

今日推荐

手把手教你用 LangChain 实现大模型 Agent

外星人入侵（python）

超全的免费chatGPT列表【建议收藏】

52.2k star! 自己部署gpt4free, 免费使用各种GPT

2024年（第十届）全国大学生统计建模大赛优秀论文解析——中国经济发展与碳排放库兹涅茨曲线的验证研究

【自动驾驶技术】自动驾驶汽车AI芯片汇总——NVIDIA篇

7个免费的ChatGPT网站，给大家送上

Angular v18 正式发布！

【VMware】 vCenter Converter standalone 6.6.0正式版下载

开源日报 | Angular v18；大模型价格战下的推理优化；Mistral AI以开源模型瞄准美国市场；硅谷有自己的鲁迅

数学建模Matlab之数据预处理方法

充电桩---ISO15118协议详细介绍

周排行

慧测学习课件

Mscordacwks.dll/SOS.dll 调试归档

关于深度学习人工智能模型的探讨（二）（7）

Stop Using the text-indent:-9999px

Least Common Multiple（HDU - 1019 ）

Comparator接口的使用方法--例子

修改framework Camera的API,旋转摄像头

机器学习时代的“大数据+”：数据平台的设计与搭建

vue 项目部署到nginx

webstorm 常用插件集合

每日归档

更多

2024-05-29(65)

2024-05-28(2)

2024-05-27(56)

2024-05-26(6)

2024-05-25(68)

2024-05-24(65)

2024-05-23(9)

2024-05-22(41)

2024-05-21(8)

2024-05-20(36)