Why is SSD (Single Shot MultiBox Detector) not good for small target detection?

Reprint URL: https://www.zhihu.com/question/49455386

SSD is a class aware RPN with a lot of bells and whistles.

The pixels on each feature map correspond to several anchors, and the network trains the anchors to drive the feature training.

This is the foreword.

As a small object, the corresponding anchors are relatively small (anchors with gt overlap > 0.5), which means that the pixels on the corresponding feature map are difficult to be fully trained. The reader can infer that each large ROI may cover many anchors, so these anchors have the opportunity to be trained. However, a small object cannot cover many anchors.

What's wrong with not being adequately trained? During the test, the prediction result of this pixel may be chaotic, which will greatly interfere with the normal result.

Why the data augmentation of SSD can increase so much is because each anchor is fully trained by randomly cropping (that is, if a small object is cropped, it will become a large object in the new image)

It can only be said that the result without region propisal is naturally not good at small objects. By stacking up hacks, you can slowly compare.


Author: Oh233
Link: https://www.zhihu.com/question/49455386/answer/146923342
Source: Zhihu The
copyright belongs to the author. For commercial reprints, please contact the author for authorization, and for non-commercial reprints, please indicate the source.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324929975&siteId=291194637