RetinaNet

Information papers

Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, Piotr Dollár, Focal Loss for Dense Object Detection, ICCV 2017.

https://arxiv.org/abs/1708.02002

The development of history

We focus on the content after the re-emergence of the neural network, after previous contents might briefly mention.

Two sects

Two sects mainly one stage and two stage, both have focused on accuracy and speed, but eventually trade-off.

one stage two stage
algorithms YOLO, YOLT R-CNN, SPPnet
accuracy Low(30% mAP) High(60%+ mAP)
speed Fast(100+ FPS) Slow(5 FPS)

two stage

Its main feature is to use an algorithm (selective search, RPN, etc.) to produce a series of proposal, the proposal will be fed to a pre-trained neural network (VGG-16, ResNet, etc.), after which the output is classification.

In particular , RPN will be presorted (background vs foreground)

one stage

Most models will use a series of similar sliding window classifier "anchor" directly classified.

RetinaNet

The model is based on article one stage, the purpose is also evident in order to improve accuracy.

The main contribution

Make a point: the important causes one stage there is insufficient accuracy class imbalance.

class imbalance

The so-called class imbalance is due to the two stage model will mostly bg and fg presorting, and therefore will not be much larger than the number of fg bg number, but one stage model in order to enhance the speed abandon the proposal process, so most models there can be no pre-sorting issue, which would lead to a number of very uneven, often a difference of 2 order.

Some previous solutions

OHEM directly discarded fraction easy example, will no doubt lead to incomplete data, thereby affecting the results

Solutions of this article

It proposed a new FUNC Loss:
$$
the CE (P_T) = -log (P_T) \
FL (P_T) = - \ alpha_t (. 1-P_T) ^ \ Gamma the CE (P_T)
$$
where
$$
P_T = \ left {
\ the aligned the begin {}
P = Y. 1 && \
. 1-P && otherwise
\ the aligned End {}
\ right. \
\ alpha_t = \ {left
\ the aligned the begin {}
\ Y = Alpha. 1 && \
l- \ Alpha && otherwise
\ the aligned End {}
\ right.
$$
particularly **, $ CE $ cross entropy found $ \ gamma = 2 of the experiment, \ alpha = $ 0.25 to get the best results.

probability of ground truth class

We can be found through simulation map, the relative certainty of the category (easy sample, eg bg), loss set a small, but not sure for the category compared with the (hard sample, eg fg), loss is large, thus preventing the emergence grasp a large number of categories because the advantage rule loss.

We analyzed the loss function from the four cases:

  1. Correct classification & easy target item classification - $ y = 1, p ~ 1 $

    In this case $ p_t = p ~ 1 $, and $ \ gamma> 1 $, so $ FL (p_t) << CE (p_t) $

  2. & Correct classification target item easily classified - $ y = 1, p ~ 0 $

    In this case $ p_t = p ~ 0 $, and $ \ gamma> 1 $, so $ FL (p_t) ~ CE (p_t) $

  3. Misclassified & easy target item classification - $ y = -1, p ~ 1 $

    In this case $ p_t = 1 - p ~ 0 $, and $ \ gamma> 1 $, so $ FL (p_t) ~ CE (p_t) $

  4. & Misclassified target item easily classified - $ y = -1, p ~ 0 $

    In this case $ p_t = 1 - p ~ 1 $, and $ \ gamma> 1 $, so $ FL (p_t) << CE (p_t) $

To In a nutshell, the impact is not easy to classify a large number of errors is reduced.

In particular , easy classification error does not affect the basic decrease is also very reasonable.

Extra - network construction

Net Architecture

The left part of the network is easy to see that the FPN, and for the right half is their original, it is easy to see (also mentioned in the paper) of the upper and lower two points they Parameter Not Shared convolutional network, one is used classifying the other is used to anchor box regressing (4 dims).

Initialization

Classification Subnet for the Initialization BIAS:
$$
b = -log (1 - \ PI / \ PI)
$$
$ \ pi $ refers to all anchor has $ \ pi $ fg grasp all be treated as initialization, it proved to $ \ pi = 0.01 $ more appropriate.

Experimental results

Results

To add here when OHEM 1: 3 refers to the discard low probability fg: bg.

one vs two

The results of this FIG view, RetinaNet may be called a real state-of-the-art.

Appendix

A

First defined:
$$
x_t = YX, Y \ in {\ PM1}
$$
where $ X $ denotes the number.

Focal Loss tried a variant of the conclusions in the original language summary is:

More generally, we expect any loss function with similar properties as FL or FL* to be equally effective.

B

$$
\frac{dCE}{dx} = y(p_t - 1)\
\frac{FL}{dx} = y(1 - p_t)^\gamma(\gamma p_t log(p_t) + p_t - 1)\
\frac{dFL^}{dx} = y(p_t^ - 1)
$$

city

Concluded that, when $ x_t> 0 $, FL derivative is closer to zero than CE.

to sum up

To be good at discovering the nature of the problem - seen an important reason stage algorithms compare the two one stage algorithms from the then popular method.

Guess you like

Origin www.cnblogs.com/edbean/p/11267242.html