Interpretation of the paper: Self-Distillation from the Last Mini-Batch for Consistency Regularization

1. Basic information of the paper

  • 论文:Self-Distillation from the Last Mini-Batch for Consistency Regularization
  • Address: https://arxiv.org/pdf/2203.16172.pdf
  • Code: https://github.com/Meta-knowledge-Lab/DLB
  • Conference: CVPR2022

2. Background and Abstract

In fact, there have been many studies on knowledge distillation methods. Knowledge distillation is essentially a regularization method. In image classification tasks, after distillation is added, the train acc of the data set is basically reduced, and eval acc is in the parameter Under the right circumstances, the basics will be improved.

Using the teacher model to carry out knowledge distillation generally requires relatively high computing power of the machine, and the process is also relatively cumbersome. Previous self-distillation strategies generally required changing the model structure, such as adding an attention block or dropout. This paper improves this self-distillation strategy, distills based on the consistency of the prediction results of the same batch data, and finally proposes a DLB distillation method, which achieves SOTA.

3. DLB method flow chart

The flow chart of the DLB method is as follows. At each iteration, the data of each batch contains bt b_t <

Guess you like

Origin blog.csdn.net/u012526003/article/details/124560997