TinyBERT简单note - 代码天地

TinyBERT简单note

其他 2020-05-19 23:40:40 阅读次数: 0

TinyBERT：

　　提出了一种基于Transformer架构的蒸馏方法（Transformer distillation）

　　两阶段的框架，

　　　　a.预训练阶段

　　　　b.fine-tuning阶段

　　对Embedding，Attention，都做了压缩

知识蒸馏(KD)

　　目标是设计behavior函数f和loss函数L，从而让student网络尽可能好的能够学习到teacher网络的知识

Transformer distillation:

从图中我们可以看到M<N,所以我们希望student的层能够对应上teacher的抹一层，即找一个映射n=g(m). TinyBERT中同时考虑了Embedding和prediction这两层的压缩，即0 = g(0), N+1 = g(M+1). 形式上，我们需要最小化下面的目标函数

Attention loss

Hidden state loss

Embedding loss

Prediction loss

综上，我们可以得到以下loss

猜你喜欢

转载自www.cnblogs.com/skykill/p/12920375.html

TinyBERT简单note

【笔记】TinyBERT(EMNLP2019)

note 4 变量与简单I/O

【note】简单的Qt + libvlc视频播放

note

Note It

TinyBERT: Distilling BERT for Natural Language Understanding翻译

BERT系列： tinyBERT 介绍与代码训练。

python note1.简单定义，数据类型

《2019-TINYBERT DISTILLING BERT FOR NATURAL LANGUAGE UNDERSTANDING》-阅读心得

TinyBERT: Distilling BERT for Natural Language Understanding（2019-9-23）

CodeForce1016A - Death Note(简单模拟,题长程序短系列)

c++note1 简单的学生信息处理程序实现

(最简单)红米Note 4X的USB调试模式在哪里开启的经验

小米Note 2简单卡刷开发版启用root超级权限的步骤

【博客存档】Machine Learning With Spark Note 2：构建一个简单的推荐系统

链表NOTE

tmp note

vim Note

Mac Note

PaaS Note

Endeca Note

git note

Memcached note

management note

Vagrant Note

Golang Note

Mysql note

Assembly Note

JMS note

今日推荐

中国码农的“35岁魔咒”

蘭雅 CorelDRAW 插件 2024.5.1 国际劳动节版，免费下载

Arc Browser for Windows 1.0 正式 GA

90后程序员开发视频搬运软件、不到一年获利超 700 万，结局很刑！

《美国对全球网络空间安全与发展的威胁和破坏》报告发布

周排行

Java基础复习_day13_Collection集合

2018.11.16 c语言学习经验

且看Java内置四大核心函数式接口

小程序云开发中数据库的数据分段和显示图片

python的函数

Web-JS进阶

【干货】C++常用代码积累笔记大全

Spring的ioc操作与 IOC底层原理

构建之法20191121-11 Scrum立会报告+燃尽图 07

Spring boot之Hello World访问404

每日归档

更多

2024-05-05(0)

2024-05-04(7)

2024-05-03(19)

2024-05-02(0)

2024-05-01(4)

2024-04-30(1)

2024-04-29(40)

2024-04-28(0)

2024-04-27(56)

2024-04-26(39)