Target detection of deep learning (2) SSD

Tip: After the article is written, the table of contents can be automatically generated. How to generate it can refer to the help document on the right


foreword

This article records the basic principles of the target detection network SSD algorithm.


The following is the text of this article.

1. SSD

Original paper: SSD: Single Shot MultiBox Detector
SSD network is a paper published by the author Wei Liu on ECCV 2016. For a network with an input size of 300*300, Nvidia Titan X achieves 74.3% mAP and 59FPS on the VOC 2007 test set. For 512 * 512 network, reached 76.9% mAP surpassed the strongest Faster RCNN 73.2% mAP
(true real-time)

Problems with Faster RCNN

1. The detection effect on small targets is very poor
2. The model is large and the detection speed is slow

Overall structure:
insert image description here

Extract multiple feature layers, predict relatively small targets on relatively low-level feature layers (retain more detailed information), and detect larger targets on high-level feature layers (easy to match corresponding bounding boxes)

1.1 Scale and aspect setting of Default Box

insert image description here
For aspect ratios=1, a default box is also added in the paper, and its scale is.
insert image description here
For Conv4_3, Conv10_2, and Conv11_2, the paper uses 4 default boxes by default, and 6 default boxes for other feature layers.
See the figure below for details:

insert image description here
The total number of default boxes is 8732.

1.2 Predictor

For a feature layer of n m and a channel of p, the size of the convolution kernel is 3 3 and the number of convolution kernels is p for direct erasing , and the corresponding category score and the relative offset to the default box are generated.

For each default box, corresponding c category scores and 4 relative offsets are generated.
Therefore, corresponding to the feature layer of m* n, a total of (c+4)* k *m *n outputs are generated. (The category c includes the background category)
(In the faster RCNN, the corresponding bounding box regression parameters will be generated for each category)

1.3 Selection of positive and negative samples

Positive sample:
1) The default box with the largest iou corresponding to each ground truth
2) For each default box with iou>0.5 with any ground truth

Negative samples (Hard Negative mining):
For the selection of negative samples, first calculate the highest confidence loss for samples that are not positive samples (the larger the value, the greater the probability that the network will predict it as a positive sample), and select the first n. (The specific number is three times the number of positive samples)

1.4 Loss

insert image description here

Category loss:
insert image description here
Positioning loss:
Only for positive samples
insert image description here

Guess you like

Origin blog.csdn.net/weixin_43869415/article/details/121730457