论文解读：Decoupled Knowledge Distillation

其他 2023-04-08 08:32:25 阅读次数: 0

1. 论文基本信息

论文：Decoupled Knowledge Distillation
地址：https://arxiv.org/pdf/2203.08679.pdf
代码：https://github.com/megvii-research/mdistiller

2. 背景与摘要

知识蒸馏花样繁多，在有教师模型的基础上，知识蒸馏可以从response、relation、feature等角度进行研究，设计loss，本文中，作为认为其实最基础的KD仍然有很大的潜力可以挖掘。因此对KD方法进行改进，将教师模型中携带的信息进行解耦，分别监督，最终精度达到了SOTA，让KD方法再次焕发新春。

3. 方法介绍

KD中提出，教师模型包含dark knowledge，这种dark knowledge相比于hard label包含了更多的信息，因此学生模型可以学到更多的知识。但是教师模型的osftlabel有个缺点，就是两级分化比较严重，即对于top-1 class id会给出非常高的score接近1），对于其他的类别，其score接近0，因此在这里一般建议设置温度系数t，对标签进行平滑，增加教师模型中的dark knowledge。

上面这种做法其实有个缺点：对于不同的教师模型，可能都需要设置不同的温度系数（教师模型的拟合程度不同），这个增加了蒸馏的难度。

作者在研究的过程中，发现其实KD loss是可以解耦的，解耦为

ta

猜你喜欢

转载自blog.csdn.net/u012526003/article/details/124562162

论文解读：Decoupled Knowledge Distillation

[论文解读]Explaining Knowledge Distillation by Quantifying the Knowledge

【KD】2022 CVPR Decoupled Knowledge Distillation

Residual Knowledge Distillation论文精度

知识蒸馏（Knowledge distillation）必读论文合集

Decoupled Knowledge Distillation（CVPR 2022）原理与代码解析

【CVPR2020 论文翻译】 | Explaining Knowledge Distillation by Quantifying the Knowledge

知识蒸馏（Knowledge Distillation）

知识蒸馏Knowledge Distillation

Knowledge Distillation examples

BERT and Knowledge Distillation

【随记】Knowledge Distillation: A Survey

On the Efficacy of Knowledge Distillation 解析

Private Model Compression via Knowledge Distillation 论文笔记

论文讲解：Knowledge distillation: A good teacher is patient and consistent

《Channel-wise Knowledge Distillation for Dense Prediction》论文详解

Self-Knowledge Distillation: A Simple Way for Better Generalization论文阅读

KDGAN: Knowledge distillation with generative adversarial networks论文笔记

Learning efficient object detection models with knowledge distillation论文笔记

Preparing Lessons: Improve Knowledge Distillation with Better Supervision论文笔记

论文阅读：Point-to-Voxel Knowledge Distillation for LiDAR Semantic Segmentation

Student Helping Teacher: Teacher Evolution via Self-Knowledge Distillation论文解读

解读《Document-Level Relation Extraction with Adaptive Focal Loss and Knowledge Distillation》论文

Knowledge Distillation 知识蒸馏详解

Knowledge Distillation(KD) 知识蒸馏

知识蒸馏简介（Knowledge Distillation）

知识蒸馏（Distillation）相关论文阅读（1）——Distilling the Knowledge in a Neural Network（以及代码复现）

论文阅读 - FedACK: Federated Adversarial Contrastive Knowledge Distillation for Cross-Lingual

论文笔记|CVPR2023:Supervised Masked Knowledge Distillation for Few-Shot Transformers

Feature-map-level Online Adversarial Knowledge Distillation论文笔记

今日推荐

美国拟限制 AI 大模型出口中国和俄罗斯

苹果将与 OpenAI 达成协议，将 ChatGPT 应用于 iPhone

openKylin 社区生态委员会第六次会议圆满召开

阿里云正式发布通义千问 2.5

Python 3.13 发布首个 Beta：实验性自由线程模式和 JIT、改进交互式解释器

Stack Overflow 拿我的代码去训练 AI 大模型，还封了我的账号

Pop!_OS 的 COSMIC 桌面完成 App Store 上架工作

报告：Django 仍然是 74% 开发者的首选

《2024 年一季度互联网投融资运行情况》研究报告

15 年前上了“FFmpeg 耻辱柱”，今天他还得谢谢咱——腾讯QQPlayer一雪前耻？

TIOBE 5 月榜单：Fortran “复活”进入 Top 10

GCC 14.1 发布

周排行

NEFU 117 素数个数的位数

Closest Common Ancestors (Lca,tarjan)

ELK部署

【转载】Hive笔记整理（三）

SQL语句（一）基本表的定义

关于Java web开发中的MySQL的事务语句

MFC创建自定义窗体

如何用一句话激怒程序员？

《逆袭大学》文摘——9.4 基础和应用的平衡中找到大学的节奏

【spring源码分析】@Value注解原理

每日归档

更多

2024-05-11(38)

2024-05-10(38)

2024-05-09(35)

2024-05-08(42)

2024-05-07(14)

2024-05-06(40)

2024-05-05(0)

2024-05-04(7)

2024-05-03(19)

2024-05-02(0)