The principle and impact of the two weighting parameters in Focal loss

First of all, you need to clarify a weighting detail in the loss function: if you want to weight the samples in the loss function, then the idea of ​​weighting should be reversed. Because the optimization goal of the loss function is as small as possible, the part you want to protect should be given a small weight so that this part can be large. And the part that wants to be punished more should be given greater weight, so that they can only be forced to be small.

 

Focal loss: FL(p_t)=\begin{cases} -\alpha(1-p)^{\gamma}log(p) & \text{if } y=1 \\ -(1-\alpha)p^{\gamma } log ( 1 - p ) & \ text { if } otherwise \ end { cases } . The two core parameters in it  \alpha are sum  \gamma

 

Among them,  \alpha it is similar to class weight to add weight to the category. If y = 1 and the number of class samples is greater than y = 0, then  \alpha it should be less than 0.5, to protect classes with fewer samples and punish classes with more samples. The conclusion is that the more unbalanced the sample, \alpha the closer it should be to 0 or 1.

And  \gamma the effect is to separate difficult cases. The larger this parameter is, the more the predicted probability value will be at both ends of 0-1. The specific reasoning is shown in the figure below:

 

Guess you like

Origin blog.csdn.net/yangyehuisw/article/details/106216283