Compression algorithm model of the distillation Summary
The original document: https://www.yuque.com/lart/gw5mta/scisva
September 7, 2019 Production
Brain image of the original document: http://naotu.baidu.com/file/f60fea22a9ed0ea7236ca9a70ff1b667?token=dab31b70fffa034a ( kdxj )
Output registration
Distilling the Knowledge in a Neural Network(NIPS 2014)
- Teachers use the model of soft-target
Deep Mutual Learning(CVPR 2018)
- Alternating training more students to network with each other to promote
Born Again Neural Networks(ICML 2018)
- 1 student from the teacher training, in order to train students by students i i + 1, the last model to integrate all students
Direct registration
Fit attention map
Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer(ICLR 2017)
- Wherein each stage registration through a single channel after channel Attention FIG fused obtained
Learning Lightweight Lane Detection CNNs by Self Attention Distillation(ICCV 2019)
- 使网络各阶段的特征通过通道融合计算注意力图,配准早期的输出注意力图
拟合特征
FitNets : Hints for Thin Deep Nets(ICLR2015)
- 第一阶段使用一个回归模块来配准部分学生网络和部分教师网络的输出特征,第二阶段使用soft targets
关系配准
拟合特征两两之间的关系
A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning(CVPR 2017)
- 计算相邻阶段特征个通道之间的关系进行配准
Graph-based Knowledge Distillation by Multi-head Attention Network(BMVC 2019)
- 使用non-local挖掘相邻阶段特征奇异值分解处理后的特征之间的关系
拟合输出中蕴含的关系
Similarity-Preserving Knowledge Distillation(ICCV 2019)
- 整个batch内部样本对应输出特征之间的关系
Relational Knowledge Distillation(CVPR 2019)
- batch中任意二元数据对应输出的距离关系和三元组输出对应角度关系
Data Distillation: Towards Omni-Supervised Learning(CVPR2018)
- 教师模型与学生模型结构可同可不同,会集成不同变换后的样本对应的教师网络的输出
Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results(NIPS 2017)
- Semi-supervised method, using the current model of teacher student model parameter weights and weight parameters on a cycle to calculate exponential moving average, consistency constraints
Relationship own internal fitting feature
Knowledge Adaptation for Efficient Semantic Segmentation(CVPR 2019)
- Teachers model uses conversion characteristics from the encoder, the students use the model adaptation unit to adapt the model feature teachers
Structured Knowledge Distillation for Semantic Segmentation(CVPR 2019)
- Combined with the soft targets, as well as more advanced information gan fit to do