Target detection algorithm: interpretation of anchor_free series

Target detection algorithm: anchor_free interpretation

illustrate

The anchor free series is another development branch of the single-stage detection algorithm. It is very necessary to understand the common algorithms of anchor free.

Disclaimer

If there is a mistake in writing/wrong writing/wrong opinion/wrong interpretation, or if you have other opinions, you can point it out in the comment area, and the blogger will study it seriously.

Original paper download link

CornerNet、CenterNet、FCOS。

Directory Structure

Article directory

- Target detection algorithm: anchor_free interpretation

1. Basic cognition

1.1 What is anchor free

Anchor free means not using anchor. Other similar xxx free also means not using xxx.

1.2 Disadvantages of the anchor mechanism

There are two main disadvantages of using the anchor mechanism:

Unbalanced positive and negative samples
- Because the anchor mechanism is to generate k boxes for each point, it is destined that the boxes without objects in an image occupy the vast majority
Introduced more prior parameters
- For example, the size, ratio, and number of the anchor box are difficult to determine a priori parameters

2. CornetNet

The key to CornerNet is to locate the upper left and lower right corners of the object.

2.1 Model Architecture

The model architecture in the paper is shown in the figure below:

insert image description here

The above architecture is mainly divided into three parts: input image + hourglass network + prediction module .

Hourglass network = N Hourglass Modules, and an Hourglass Module usually consists of upsampling and downsampling structures. This network is actually proposed by another paper, and you can see the original paper for details.
Prediction Module (prediction module), the structure diagram is as follows:

insert image description here

This prediction module is very simple, but there are two points to note, namely corner pooling and three output values .

2.2 corner pooling

First, look at the picture below:

insert image description here

For these objects, they are not square, but in order to ensure that the obtained corner points (upper left point and lower right point) are the uppermost or lowermost points of the object , corner pooling is required.

The principle is shown in the figure below:

insert image description here

First perform a single-direction maximum pooling operation in two directions, and finally add the values at the same position in the two directions. An example is as follows:

insert image description here

2.3 Three output values

HeatMaps

Output prediction corner information, it can be understood that heatmaps contain both key point information and probability values. In addition, each point of the heatmap is a key point. The loss can be calculated by mapping the key points to the original image and the real box, or the real box can be mapped to the same size as the heatmap to calculate the loss .

Loss formula:

insert image description here

This formula is a modified version of Focal Loss, where α controls the difficulty of the sample, and C represents the target category (excluding the background).

Offsets

Output the regression offset, which is used to fine-tune the prediction frame, because the rounding error will be caused when the feature map is converted to the original image.

Loss formula:

insert image description here

The formula is the commonly used target detection regression offset loss, where ok is:

insert image description here

where n represents the scaling factor.

Embedding

An image has more than one object, so there are more than one pair of corner points, so the question is how to match different corner points? Therefore, the author gives the third output, represented by a distance vector, the smaller the distance between the vectors, the higher the degree of matching.

The loss formula is divided into two parts, one is to reduce the spacing of the same corner points, and the other is to enlarge the spacing of different corner points:

insert image description here

2.4 Gaussian circle

Look at the picture below:

insert image description here

In fact, the above prediction box is already very good, but the network will still be included in the loss. Therefore, we need to introduce a threshold to control the loss value. The method of this threshold is called Gaussian circle.

The principle is shown in the figure: taking the critical situation as an example, it is divided into three situations
insert image description here

We can assume that the critical IOU is 0.7 or a certain value, and then calculate the radius r of the Gaussian circle through the above three pictures .

3. CenterNet

CenterNet detects targets through center points.

3.1 Model Architecture

The architecture of CenterNet is similar to that of CornerNet, divided into three parts:

input image
backbone
- Hourglass, ResNet, etc. can be used
prediction module
- The specific structure is related to the backbone used, but one thing that can be determined is to output three values .

Use resNet as the backbone to show its process

1. image resize 为 512*512，则输入shape为【1，3，512，512】
2. ResNet输出【1，2048，16，16】
3. 将输出特征图反卷积上采样为【1，64，128，128】
4. 送入三个预测分支：
	heatmaps ： 【1，80，128，128】 （80表示类别个数）
	sizes ： 【1，2，128，128】	（2=w+h，图像尺寸）
	offsets ： 【1，2，128，128		（2=x、y的偏移值）

3.2 Loss function

The loss function of centerNet is roughly similar to the loss function of CornerNet, but the details are a little different:

insert image description here

The specific formula is as follows:

insert image description here

Among them, N is the number of key points, sk represents the target prediction size, p is the center point coordinates, R is the scaling factor, and p- is the approximate integer target of the scaled center point (this is the same as corner net).

3.3 Three output values

sizes

This is the size of the box, with the center key point to determine the entire box.

offsets

This is the offset, mainly because when the real frame falls on the feature map, the error will be caused by the rounding operation, so the error is introduced.

heat maps

This is to generate a heatmap for each category to determine the center point position and target probability value.

4. FCOS

Unlike cornernet and centernet, FCOS realizes detection by predicting the four distance values from the key point to the prediction frame.

4.1 Model Architecture

The model architecture is shown in the figure below: (the original picture of the paper)
insert image description here

The architecture is easy to read, but there are a few key points:

Reasons for the introduction of FPN and PAN
three output values
loss function

4.2 Introduction of FPN and PAN

First of all, it is necessary to know what is a fuzzy sample, that is, a point where a key point falls into multiple GTs at the same time.

For these points, the function of introducing FPN and PAN is to use feature maps of different sizes to predict objects of different sizes, so as to avoid the existence of fuzzy samples as much as possible. At the same time, it can also improve the accuracy of the model.

4.3 Three output values

classes：H、W、C
- Category output value, where C represents the number of categories
Regression：H、W、4
- Coordinate regression value, where 4 represents the distance from the key point to the four sides of the box
Center-ness：H、W、1
- Center output, the function is to filter out the false detection frame, which measures the distance value of the key point (random point, not necessarily the center point) to the GT center point

4.4 Loss function

The formula is as follows:

insert image description here

Among them, Lcls uses BCELoss and Focal loss (both positive and negative samples participate in the calculation), and lreg uses IOU loss (only positive samples participate).

In addition, there is actually a part of the loss, which is the center loss. The formula is as follows:
insert image description here

Among them, Sx, y is the center-ness value of the point (x, y) on the feature map. The author gives its formula and meaning in the paper, which is used to measure the distance from the key point to the center point:

insert image description here

4.5 Regression

In FCOS, the value obtained due to regression is the distance from the key point to the four sides of the box. Therefore, the principle is shown in the figure below:

insert image description here

The calculation formula is as follows:

insert image description here

Therefore, the closer the four distance values that the model can output to the real distance value, the better the effect.

5. Summary

The above briefly explained three classic anchor free algorithms. These anchor free algorithms are based on a certain key point to detect the target. Although the anchor mechanism is not used, the generation of key points is actually somewhat similar to the anchor idea (the anchor is Each feature point of the feature map generates a certain number of suggestion boxes, while anchor free generates a heat map, each point of the heat map is a key point, and then uses these key points to compare the value of the real box map for training Model).