Alibi:Attention With Linear Biases Enables Input Length Extrapolation

业界资讯 2023-08-06 14:33:22 阅读次数: 0

Alibi:Attention With Linear Biases Enables Input Length Extrapolation

Introduction
Method
Result
参考

Introduction

假设一个模型在512token上做训练，在推理的时候，模型在更长的序列上表现叫做模型的外推性。作者表明以前的位置编码如Sin、Rotary、T5 Bias 的外推性都随着推理长度的增加变得越来越差。基于此，坐着的提出了Alibi，如下图：
在这里插入图片描述
Alibi与其他位置编码相比，随着推理token长度的增加模型对token的困惑度基本不变。
同时，Ailibi在训练速度与推理速度上都比T5与Rotary要快，与Sin相当，内存占用上也要比前者少11%。

Method

在这里插入图片描述

Alibi的方法十分简单，如上图，在计算 attention score的时候，会对以前的分数按照与当前的位置差距进行不同程度的惩罚。假设在计算q3与k3的attention时，q3还会考虑 k1，k2的attention，其中对q3k1就-2，对q3k2就-1。然后在乘上坡度m，其中作者发现m不需要根据不同数据选择不同的值，在使用的时候不变即可，m在不同的head上设置方法如下：
在这里插入图片描述

Result

在这里插入图片描述

参考

https://arxiv.org/pdf/2108.12409.pdf

猜你喜欢

转载自blog.csdn.net/qq_18555105/article/details/131442418

Alibi:Attention With Linear Biases Enables Input Length Extrapolation

linear interpolation and linear extrapolation

springboot 部署Input length = 1 或者 Input length = 2

input_dim、input_length的理解

Android运行报Input length = 1

关于java.nio.charset.MalformedInputException: Input length = 1和Input length = 2的异常解决

linear self attention 的pytorch实现和使用

lab-04-3-file_input_linear_regression

linear(): argument ‘input‘ (position 1) must be Tensor, not str

java.nio.charset.MalformedInputException: Input length = 1

IllegalBlockSizeException: Input length must be multiple of 8 when decrypting with padded cipher

Input length must be multiple of 8 when decrypting with padded cipher

android java.nio.charset.MalformedInputException: Input length = 1

Caused by: java.nio.charset.MalformedInputException: Input length = 2

Caused by: java.nio.charset.MalformedInputException: Input length = 1/2；

关于解决java.nio.charset.MalformedInputException: Input length = 1

Input length must be multiple of 8 when decrypting with padded cipher 错误

springboot的yml配置文件报错细节-Input length

MalformedInputException: Input length = 1，statement (not found):SysConfigMapper.selectConfigList

YAMLException : java.nio.charset.MalformedInputException : Input length = 1

tensorboard VS Weights & Biases

Weights & Biases的使用

Weights and Biases使用教程

length

Generalized Vulnerability Extrapolation using Abstract Syntax Trees

2021 《Beyond Self-attention: External Attention using Two Linear Layers for Visual Tasks》 Pytorch实现

scala scala.io.Source.fromFile读取文件报错：MalformedInputException: Input length = 1

java AES 加密,报javax.crypto.IllegalBlockSizeException: Input length must be multiple of 16 when decryp

scala文件读取报错“java.nio.charset.MalformedInputException: Input length = 1”

JAVA实现AES 解密报错Input length must be multiple of 16 when decrypting with padded cipher

今日推荐

美国拟限制 AI 大模型出口中国和俄罗斯

苹果将与 OpenAI 达成协议，将 ChatGPT 应用于 iPhone

openKylin 社区生态委员会第六次会议圆满召开

阿里云正式发布通义千问 2.5

Python 3.13 发布首个 Beta：实验性自由线程模式和 JIT、改进交互式解释器

Stack Overflow 拿我的代码去训练 AI 大模型，还封了我的账号

Pop!_OS 的 COSMIC 桌面完成 App Store 上架工作

报告：Django 仍然是 74% 开发者的首选

《2024 年一季度互联网投融资运行情况》研究报告

15 年前上了“FFmpeg 耻辱柱”，今天他还得谢谢咱——腾讯QQPlayer一雪前耻？

TIOBE 5 月榜单：Fortran “复活”进入 Top 10

GCC 14.1 发布

周排行

NEFU 117 素数个数的位数

Closest Common Ancestors (Lca,tarjan)

ELK部署

【转载】Hive笔记整理（三）

SQL语句（一）基本表的定义

关于Java web开发中的MySQL的事务语句

MFC创建自定义窗体

如何用一句话激怒程序员？

《逆袭大学》文摘——9.4 基础和应用的平衡中找到大学的节奏

【spring源码分析】@Value注解原理

每日归档

更多

2024-05-11(38)

2024-05-10(38)

2024-05-09(35)

2024-05-08(42)

2024-05-07(14)

2024-05-06(40)

2024-05-05(0)

2024-05-04(7)

2024-05-03(19)

2024-05-02(0)