Deep Learning中Transformer的学习笔记 - 代码天地

Deep Learning中Transformer的学习笔记

业界资讯 2023-12-17 04:30:48 阅读次数: 0

目录

1. Tranformer与RNN对比

1.1 RNN模型缺点:

1.2 Transformer优点:

2. Transformer的结构

2.1 Attention 注意力机制作用的流程

首先就是那篇最著名的文章《Attention is all you need》, 链接如下: https://arxiv.org/abs/1706.03762

1. Tranformer与RNN对比

RNN Model: Recurrent Neural Networks RNN模型一般用于时间序列的预测,一般把信息存在 $h_t$ 和h $h_{t-1}$ 的信息上,所以时点上携带的信息比较大.

1.1 RNN模型缺点:

Slow to train 训练的速度特别慢
Long sequences lead to vanishing/exploding gradients
LSTM is slower

1.2 Transformer优点:

Attention mechanism has an infitnite reference window 注意力机制有无限的参考时间窗口

2. Transformer的结构

在使用卷积网络的时候, 是使用较小的窗格来进行计算,所以当位置比较远的时候,可能会需要再比较远的地方才能找到信息. 而transformer则可以一起看到.

文章中结构的图示:

在图中左边是编码器,右边是解码器.

与batchNorm对比, LayerNorm用的比较多的原因是因为在时序序列中,样本数据的量可能在变化

2.1 Attention 注意力机制作用的流程

Attention是Transformer的重点, 主要的目标是为了找到input里面最重要的特征 (identify and attend to most important features in input.)

Attention: What part of the input should we focus?

注意力机制分为加性和乘性. N为编码器的层数的堆叠, 每一个子层之间通过残差连接.在解码器的连接也是通过残差连接. Masked Multi-Head Attention, 带有掩码的注意力机制,也就是在training的时候不会看到t后面的数据.

Attention在Transformer里面的使用方式(编码器Encoder-解码器 Decoder):

编码位置信息 Encode position information
提取查询、键、值用于搜索 Extract query, key, value for search
计算注意力加权 Compute attention weighting
提取高度关注的特征 Extract features with high attention

参考资料:

https://www.youtube.com/watch?v=TQQlZhbC5ps

https://www.youtube.com/watch?v=nzqlFIcCSWQ

https://arxiv.org/abs/1706.03762

https://www.youtube.com/watch?v=ySEx_Bqxvvo&t=117s

猜你喜欢

转载自blog.csdn.net/weixin_44897685/article/details/130920338

Deep Learning中Transformer的学习笔记

Deep Learning 学习笔记

[更新中] Deep Learning笔记

Deep Learning With Python 学习笔记

《Deep Learning》学习笔记（一）

【Deep Learning】Spatial Transformer Networks

Deep Learning

Deep Learning 简略笔记

deep learning实验笔记

【深度学习】Deep Learning

《deep learning》学习笔记（8）——深度模型中的优化

Deep Learning（深度学习）学习笔记整理

深度学习 Deep Learning 学习笔记

深度学习 DEEP LEARNING 学习笔记（一）

neural networks and deep learning 学习笔记

python Deep learning 学习笔记（1）

python Deep learning 学习笔记（6）

python Deep learning 学习笔记（3）

python Deep learning 学习笔记（4）

python Deep learning 学习笔记（9）

Deep learning with Python 学习笔记（11）

python Deep learning 学习笔记（8）

《Wide and Deep Learning for Recommender Systems》学习笔记

机器学习技法笔记：13 Deep Learning

Deep Learning学习笔记（二）：图像基础

【Deep Learning】深度学习中的函数类型

Deep learning 论文笔记

花书《Deep Learning》笔记

Neural Networks and Deep Learning 笔记

Deep Learning 的阅读笔记（一）

今日推荐

TIOBE 5 月榜单：Fortran “复活”进入 Top 10

GCC 14.1 发布

面壁智能发布 Eurux-8x22B 开源大模型 —— 堪称「理科状元」

开源日报 | 谷歌扶持鸿蒙上位；开源Rabbit R1；Docker加持的安卓手机；微软的焦虑和野心；海尔电器把开放平台关了

中国码农的“35岁魔咒”

蘭雅 CorelDRAW 插件 2024.5.1 国际劳动节版，免费下载

Arc Browser for Windows 1.0 正式 GA

90后程序员开发视频搬运软件、不到一年获利超 700 万，结局很刑！

周排行

Java自定义时间格式

同步整形电路

在开发中最最最常用的字符串的属性大集合

Linux 查看端口占用并杀掉

Java基础四：ArrayList

多线程之死锁就是这么简单

mysql 基础命令集

awk 命令详解

Centos6.3编译安装nginx+php步骤

OCR （Optical Character Recognition，光学字符识别）

每日归档

更多

2024-05-08(42)

2024-05-07(14)

2024-05-06(40)

2024-05-05(0)

2024-05-04(7)

2024-05-03(19)

2024-05-02(0)

2024-05-01(4)

2024-04-30(1)

2024-04-29(40)