[Model] compression algorithm distillation Summary

Compression algorithm model of the distillation Summary

The original document: https://www.yuque.com/lart/gw5mta/scisva

Google Slide: https://docs.google.com/presentation/d/e/2PACX-1vSsa5X_zfuJUPgxUL7vu8MHbkj3JnUzIlKbf-eXkYivhwiFZRVx_NqhSxBbYDu-1c2D7ucBX_Rlf9kD/pub?start=false&loop=false&delayms=3000

September 7, 2019 Production

Here Insert Picture Description

Brain image of the original document: http://naotu.baidu.com/file/f60fea22a9ed0ea7236ca9a70ff1b667?token=dab31b70fffa034a ( kdxj )

Output registration

Distilling the Knowledge in a Neural Network(NIPS 2014)

  • Teachers use the model of soft-target

Here Insert Picture Description

Deep Mutual Learning(CVPR 2018)

  • Alternating training more students to network with each other to promote

Here Insert Picture Description

Born Again Neural Networks(ICML 2018)

  • 1 student from the teacher training, in order to train students by students i i + 1, the last model to integrate all students

Here Insert Picture Description

Here Insert Picture Description

Direct registration

Fit attention map

Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer(ICLR 2017)

  • Wherein each stage registration through a single channel after channel Attention FIG fused obtained

Here Insert Picture Description

Here Insert Picture Description

Learning Lightweight Lane Detection CNNs by Self Attention Distillation(ICCV 2019)

  • 使网络各阶段的特征通过通道融合计算注意力图,配准早期的输出注意力图

Here Insert Picture Description

Here Insert Picture Description

拟合特征

FitNets : Hints for Thin Deep Nets(ICLR2015)

  • 第一阶段使用一个回归模块来配准部分学生网络和部分教师网络的输出特征,第二阶段使用soft targets

Here Insert Picture Description

Here Insert Picture Description

Here Insert Picture Description

关系配准

拟合特征两两之间的关系

A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning(CVPR 2017)

  • 计算相邻阶段特征个通道之间的关系进行配准

Here Insert Picture Description

Here Insert Picture Description

Here Insert Picture Description

Graph-based Knowledge Distillation by Multi-head Attention Network(BMVC 2019)

  • 使用non-local挖掘相邻阶段特征奇异值分解处理后的特征之间的关系

Here Insert Picture Description

Here Insert Picture Description

拟合输出中蕴含的关系

Similarity-Preserving Knowledge Distillation(ICCV 2019)

  • 整个batch内部样本对应输出特征之间的关系

Here Insert Picture Description

Relational Knowledge Distillation(CVPR 2019)

  • batch中任意二元数据对应输出的距离关系和三元组输出对应角度关系

Here Insert Picture Description

Here Insert Picture Description

Data Distillation: Towards Omni-Supervised Learning(CVPR2018)

  • 教师模型与学生模型结构可同可不同,会集成不同变换后的样本对应的教师网络的输出

Here Insert Picture Description

Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results(NIPS 2017)

  • Semi-supervised method, using the current model of teacher student model parameter weights and weight parameters on a cycle to calculate exponential moving average, consistency constraints

Here Insert Picture Description

Here Insert Picture Description

Relationship own internal fitting feature

Knowledge Adaptation for Efficient Semantic Segmentation(CVPR 2019)

  • Teachers model uses conversion characteristics from the encoder, the students use the model adaptation unit to adapt the model feature teachers

Here Insert Picture Description

Here Insert Picture Description

Here Insert Picture Description

Structured Knowledge Distillation for Semantic Segmentation(CVPR 2019)

  • Combined with the soft targets, as well as more advanced information gan fit to do

Here Insert Picture Description

Here Insert Picture Description

Bold Style

Here Insert Picture Description

Guess you like

Origin www.cnblogs.com/lart/p/11505544.html