Introduction of Focal Loss

Table of contents

foreword

  In the target detection algorithm, we often encounter Focal Loss. Today we will simply divide this loss.

1. Focal Loss

  During deep learning training, if the loss function is directly calculated when the target category is unbalanced, the final calculation result may be biased towards common categories, low recall rate, model overfitting and other problems. In order to deal with this problem, it was introduced Focal Loss. Specifically, a tunable parameter γ \gammaFocal Loss is introducedγ , which is used to adjust the weight relationship between easy-to-classify samples and difficult samples. Whenγ \gammaWhen the value of γ is low, the model pays more attention to easy-to-classify samples, and whenγ \gammaWhen the value of γ is higher, the model pays more attention to difficult samples. Focal Lossis defined as follows:
FL ( pt ) = − α ( 1 − pt ) γ log ⁡ ( pt ) FL(p_t) = -\alpha(1 - p_t)^\gamma \log(p_t)FL(pt)=a ( 1pt)clog(pt)

  In target detection, Focal Lossit is usually used one-stagein the algorithm, because two-stagethe algorithm has been screened once in the first stage, and the candidate frame in the second stage will not cause a serious imbalance of positive and negative samples. For example, in a picture, the target is usually relatively small, usually dozens of the most correct, but when using the candidate box selection, the number of candidate boxes we set is relatively large, usually Tens of thousands, if we set the loss contributed by each target to 10, the loss of the target is only a few hundred, and the loss contributed by the remaining candidate boxes is set to 0.1, and the loss of the background will also reach several thousand, then the network will be very concerned Non-targets, that is, the background, lead to very poor detection results.
  The short answer summarizes the function of one sentence Focal Loss. In fact, it is to set a hyperparameter for the network and let the network learn the weight coefficients of positive and negative samples by itself, so that the network can balance the focus on the target and the background, so as not to favor one side.
Look at an example: where it prepresents the predicted probability, yrepresents the real label, and CErepresents the ordinary cross-entropy loss, FLwhich means Focal Lossthat the calculation is based on γ = 2, α = 0.5 \gamma=2, \alpha=0.5c=2,a=0.5 calculated.

p y CE FL CE/FL
0.9 1 0.105 0.00026 400
0.968 1 0.033 0.000008 3906
0.1 0 0.105 0.00079 133
0.032 0 0.033 0.000025 1302
0.1 1 2.3 0.466 4.9
0.9 0 2.3 1.4 1.6

  From the above example, it can be seen that for easy-to-segment samples, Focal Lossits weight ratio can be significantly reduced, and it will be slightly reduced for difficult-to-segment samples. Focal LossSusceptible to noise interference when in use .
Here is a question: What is the difference between F ocal Lossand OHEM(sampling positive and negative samples)?

2. Summary

definition:

  • Focal LossThe key idea of ​​is to introduce a tunable parameter, which is used to reduce the weight of easy samples and make the model pay more attention to difficult samples.
  • OHEM(Online Hard Example Mining)is a training strategy for solving class imbalance problems. OHEMBy selecting a small number of hard examples in each training iteration and adding them to the training set, the learning ability of the model on difficult examples is improved.

the difference:

  • Focal Lossis a loss function, OHEMbut a training strategy.
  • Focal LossIt is mainly used to reduce the impact of simple samples on training and improve the learning ability of the model for difficult samples. The learning ability of the model is mainly OHEMenhanced by mining difficult examples.

The above is the introduction about Focal Loss, if there is any mistake, please correct me!

Guess you like

Origin blog.csdn.net/qq_38683460/article/details/131015223