FCOS: A Simple and Strong Anchor-free Object Detector

Paper link

1. Background

1. Disadvantages of anchor-base         

(1). The setting of anchor has a great influence on the result, and these super parameters of different projects need to be determined based on experience, which is more difficult.

(2). The anchors are too dense, and many of them are negative samples, which introduces imbalance.

(3). The calculation of anchor involves IOU to increase the computational complexity.

2.anchor free detection frame

Yolov1 can be regarded as an anchor free method. By dividing the grid, the center point of the target is located in the grid, and the corresponding grid is responsible for predicting the target. And yolov2 adopts the idea of ​​anchor. Anchors are scattered all over the pictures, and the natural recall rate has risen. fcos learns from each other's strengths and uses all points in the gt box to perform regression, and at the same time uses center-ness to optimize the center point of the prediction box.

2. Network Introduction

1. Network structure

The output branch is mainly composed of three parts

(1) classification, the size is (W, H, C), output different categories

(2) Regression, the size is (W, H, 4), output the up, down, left, and right deviations of each point on the graph

(3) The Center-ness size is (W, H, 1), which predicts the coincidence probability of the center point of the detection frame and the center point of the gt detection frame, which is used to improve the accuracy of the detection frame. The following is a specific analysis.

2. Center sampling

For any gt bbox, first map to each output layer, and use center_sample_radius×stride to calculate the positive sample area of ​​the gt bbox in each layer and the corresponding left/top/right/bottom target.

For the positive sample area of ​​each output layer, traverse each point position and calculate whether its max (left/top/right/bottom target) value is within the specified range, and the ones within the range are no longer considered as background.

(1). Reduce the number of ambiguity targets, which can greatly solve the problem of overlapping feature points not knowing which box to return to

(2). Reduce labeling noise interference

The box label usually frames a lot of irrelevant areas. If the point of the irrelevant area should be returned, it is obviously wrong, such as an airplane in the air, and the sky next to the airplane is an irrelevant area.

3.FPN

By using FPN to predict objects of different scales in each layer of feature maps, this overlap problem is solved (the overlap feature points do not know which frame to return to).

If FPN and center sampling have not been resolved, press the smallest box.

At the same time, the regression range of each layer of FPN is constrained.

P3: Regression range is [0,64], P4: Regression range is [64,128], P5: Regression range is [128,256]

P6: The regression range is [256,512], P7: The regression range is [612,inf], which solves the problem of predicting different size targets at different levels.

4. Classification

Train C two classifiers instead of training a multi-classifier.

5. Coordinate regression

The anchor-based regression method is to return the offset between the anchor and the gt box, and fcos returns the distance between up, down, left, and right at each point on the feature map (it can be considered as point-based).

official:

(l ∗, t ∗, r ∗, b ∗ ): the four quantities of the regression;

(x0,y0),(x1,y1): gt is the upper left and lower right coordinates above the original image;

(x,y): the coordinates of the points in the feature map;

s: The step size of the corresponding feature map, which is used to compress the prediction range, and it is easy to balance the classification and regression Loss weights.

6. center-ness

The author found that many low-quality prediction boxes far away from the center point of the taget are introduced. If the center point of the prediction box is closer to the center point of the target box, the prediction box is more reliable. Therefore, the center-ness branch is introduced in the regression branch to regress the center of the prediction frame and the center of the target. Although it is a regression problem, the celoss is used.

It can be seen that the boxes with low IOU but high score have been greatly reduced, and the consistency of IOU and score has been improved.

7.loss function

 

Lcls: focal loss

Lreg: GIoU loss

Npos: positive sample data

λ: take 1 balance regression and classification loss

The centerness loss is used to constrain the center of the prediction box and the center point of gt, which can be directly added to the above classification and regression. During inference, the output score is the classification score multiplied by the center score.

8. Reasoning

Get the classification score of each point on the feature map and combine with the four boundaries of the regression to get the prediction box. Performing NMS with a threshold of 0.6.

3. Experimental results

Reference: https://www.zybuluo.com/huanghaian/note/1747551

Guess you like

Origin blog.csdn.net/fanzonghao/article/details/109487496