Tip: After the article is written, the table of contents can be automatically generated. How to generate it can refer to the help document on the right
Article directory
foreword
This article records the basic principles of the target detection network SSD algorithm.
The following is the text of this article.
1. SSD
Original paper: SSD: Single Shot MultiBox Detector
SSD network is a paper published by the author Wei Liu on ECCV 2016. For a network with an input size of 300*300, Nvidia Titan X achieves 74.3% mAP and 59FPS on the VOC 2007 test set. For 512 * 512 network, reached 76.9% mAP surpassed the strongest Faster RCNN 73.2% mAP
(true real-time)
Problems with Faster RCNN
1. The detection effect on small targets is very poor
2. The model is large and the detection speed is slow
Overall structure:
Extract multiple feature layers, predict relatively small targets on relatively low-level feature layers (retain more detailed information), and detect larger targets on high-level feature layers (easy to match corresponding bounding boxes)
1.1 Scale and aspect setting of Default Box
For aspect ratios=1, a default box is also added in the paper, and its scale is.
For Conv4_3, Conv10_2, and Conv11_2, the paper uses 4 default boxes by default, and 6 default boxes for other feature layers.
See the figure below for details:
The total number of default boxes is 8732.
1.2 Predictor
For a feature layer of n m and a channel of p, the size of the convolution kernel is 3 3 and the number of convolution kernels is p for direct erasing , and the corresponding category score and the relative offset to the default box are generated.
For each default box, corresponding c category scores and 4 relative offsets are generated.
Therefore, corresponding to the feature layer of m* n, a total of (c+4)* k *m *n outputs are generated. (The category c includes the background category)
(In the faster RCNN, the corresponding bounding box regression parameters will be generated for each category)
1.3 Selection of positive and negative samples
Positive sample:
1) The default box with the largest iou corresponding to each ground truth
2) For each default box with iou>0.5 with any ground truth
Negative samples (Hard Negative mining):
For the selection of negative samples, first calculate the highest confidence loss for samples that are not positive samples (the larger the value, the greater the probability that the network will predict it as a positive sample), and select the first n. (The specific number is three times the number of positive samples)
1.4 Loss
Category loss:
Positioning loss:
Only for positive samples