Distilling Object Detectors with Fine-grained Feature Imitation reading notes

Article link: paper pdf

CNN currently based on the detection model are generally more complex, the need to consume more computing resources. One solution to this problem is to use knowledge of distillation, the complex network as a "teacher" network, so that small-scale "students" network to the analog output "teacher" network, making small network for better performance. But now more knowledge distillation is used in relatively simple tasks, such as classification, and in the target detection do much. The author attempts to classify direct knowledge of distillation used in network detection, but poor performance, therefore, the authors propose a method to imitate fine-grained features in this article, discover the differences between the response graph. And, the authors believe that those anchors response graph close to the target area largely reflects the important information on how teachers tend network generalization. This paper presents a method to estimate these anchors near the target, then let the students imitate the network response graph of the network of teachers in these positions for better performance.

 

 

The main method

1. Regional estimates

Objective: To identify the anchors around the target area

Method: true value information to be used, a true value for each block, and it is calculated between the IOU anchors, to give a W * H * K (Ws wide feature map, H is a characteristic diagram of the high, K is a anchors number) IOU map, called m, taking the maximum value where, m = max (m), a threshold value m is calculated according to F:

 

The threshold value F, can filter out those IOU values ​​below the position F, and the use or operation in conjunction with the results of K anchors to give a W * H of the mask on ahchors dimension. Boxes are all true values ​​for this operation, combined with each mask, the estimated final mask obtained granular.

2. imitate the characteristics of fine-grained

 

 

On the Internet there is no distance between students and faculty response network diagram direct calculation, but in the back of the student network plus a full convolution adaptation layer (fully convolution adaptation layer), for two reasons: (1) a unified network of teachers and wherein the number of channels in response to the student network graph; (2) was found by doing so can improve performance. For the network of teachers and students network characteristic in response to the FIG., The distance between them is calculated (calculated only those near the target area, i.e. the estimated location above):

 

 Wherein, (i, j) indicates the position, c denotes the channel

For all estimated positions, these positions is minimized, characterized in students and teachers network response network distance map, i.e., to minimize:

 

 

 Ultimately used to train students to network loss:

 

 I.e., the true value and the joint estimation target feature vicinity difference in response to train the network of FIG.

 

Guess you like

Origin www.cnblogs.com/yangruicvpr/p/11720222.html