Learning efficient object detection models with knowledge distillation paper notes

Paper address: http://papers.nips.cc/paper/6676-learning-efficient-object-detection-models-with-knowledge-distillation
github address: None

Motivation

This paper proposes a knowledge distillation compression algorithm for object detection networks. Most of the previous knowledge distillation compression algorithms are used in classification networks. Although they can compress models and improve speed while retaining accuracy, they have only been confirmed in classification tasks, and more complex object detection has yet to be explored. In object detection tasks, there are special challenges:

  • The label information of the target detection task is larger, the model learned according to the label is more complex, and the loss after compression is more
  • In the classification task, each category is relatively balanced and equally important, while in the target detection task, there is a problem of category imbalance, and there are too many background classes
  • The target detection task is more complex, including both category classification and location regression prediction
  • The current knowledge distillation mainly distills data in the same domain. For the task of cross-domain target detection, there are higher requirements for knowledge distillation.

In response to these challenges, the author proposes an end-to-end framework based on knowledge transfer for detection problems; for problems such as fewer labels, imbalance, and regression loss in detection, it is solved by combining FItNets15 and a new loss.

Methods

The author takes the Faster R-CNN model framework as an example, and carries out knowledge distillation from the backbone network, RPN, and RCN (head).
Faster R-CNNFor the backbone network, the author uses hint learning in FitNet for distillation, that is, adding adaptation layers to make feature Dimension matching of map
For the output of classification tasks, weighted cross entropy loss is used to solve the serious problem of category imbalance
For regression tasks, in addition to the original smooth l1 loss, the author also proposes teacher bounded regression loss, which uses the teacher's regression prediction as the upper bound, and the students If the result of network regression is better, the loss is 0.
loss

  • N N N andMMM is the batch-size of the corresponding part,λ λlc cγ is a hyperparameter, the author here set it to 1 and 0.5 respectively
  • L c l s L_{cls} LclsIncluding hard target and soft target in knowledge distillation
  • L r e g L_{reg} LregIncluding smooth L1 and the newly proposed teacher bounded L2 regression loss
  • L H i n t L_{Hint} LHint​For the loss of the backbone network

For the case where the probability of background misclassification in the classification loss is relatively high, the author proposes to increase the weight of the background class in the distillation cross entropy to solve the imbalance problem. Added a wc w_cwc​, w 0 = 1.5 w_0=1.5 w0=1.5 for the background class, w i = 1 w_i=1 wi=1 for all the others
For the temperature scaling in KD loss, the author set it to 1.
weighted loss
For the distillation of the regression results, since the output of the regression is unbounded, and the prediction direction of the teacher network may be opposite to that of the groundtruth. Therefore, the author takes the teacher's output loss as the upper bound, and when the output loss of the student network is greater than the upper bound, the loss is included, otherwise the loss is not considered.
regression loss

Experiments

Datasets : KITTI, PASCAL VOC 2007, MS COCO, ImageNet DET benchmark (ILSVRC 2014)
teacher network, namely the backbone part : AlexNet, AlexNet with Tucker Decomposition, VGG16 and VGGM

Results

result
result 2
result 4

Thoughts

This article also uses knowledge distillation for the target detection network, and distills from two levels of hint learning and output learning. Among them, the distillation of regression and the weighting of classification to solve the imbalance problem are inspiring to me. The unbounded problem of the distillation method for regression has not been considered before.

Guess you like

Origin blog.csdn.net/qq_43812519/article/details/106183358