Focal Loss (RetinaNet) reading papers

This paper presents Focal Loss (focus loss), by improving the loss function, to obtain a better detection model and higher detection accuracy.

Motivation

(1) To understand the meaning of the two terms;

  • hard example, represents a hard example, such as binary, n Example 1, a negative example is 0, then the training process, the prediction result close to 0.5, known as a hard example, by hard example, the model to learn more useful information .
  • easy example, contrary to the above;

(2) with respect to the conventional one-stage two-stage the problems:

  • Counterexample more, not learn useful information;
  • Simple counterexample (easy negatives) cause degradation model;

(3) Analysis of reasons:

  • inferior to one-stage two-stage, the main one of the training classes balance (imbalance);
  • faster R-CNN two-stage method and the like, to produce the candidate region, the candidate region may then be filtered instructive to address the problem (IOU> 0.7 considered positive imbalance class category, <0.3 trans embodiment, the remaining not used for training);
  • But YOLO and SSD are end to end, resulting in the number of candidates box, how many boxes will be trained candidate, the candidate can not filter the middle frame, feasible with knowledge, resulting in a number of candidate box, it must be targeted far no less than the target, so we had a class imbalance, resulting in the above-mentioned problems. .

Solution

(1) Traditional Workaround: Perform some form of hard negative mining, which is in training, sampling hard example for training
(2) method used in this article: Change the loss function, by changing the weight, increase the hard instances of loss weight loss simple example to reduce the weight, so as to focus on hard examples

Focal Loss

Focal loss is a function of the modifications made on the basis of cross-entropy loss, loss dichotomous first reviewed Cross:
Here Insert Picture Description
visible Common cross entropy for positive samples, the smaller the greater the output loss probability. For a negative sample, the smaller the probability that the output loss is smaller. At this time of loss function more slowly in a large number of samples in a simple iterative process and may not be optimized to best. So Focal loss is how to improve it?
Here Insert Picture Description
First, the basis of the original plus a factor, where γ> 0 such that is easy to reduce the loss of classified samples, making it more difficult to focus on, the misclassified samples. Γ may also be adjusted to change the size of the sample weights simply reduced rate, that is, when γ 0 is the cross-entropy loss function, when γ increases, the influence of the adjustment factor is increasing. It was found that γ 2 is the best.
Furthermore, addition of the balance factor alpha, the sample itself is used to balance the positive and negative variation ratio:
Here Insert Picture Description

RetinaNet

Authors designed a simple network RetinaNet intensive training at the same time to ensure optimum accuracy rate reached.
Here Insert Picture DescriptionThe figure shows on the network structure and not much innovation, but because it is used Focal Loss, achieved good accuracy.

Guess you like

Origin blog.csdn.net/qq_41332469/article/details/90214829