原论文:《T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Doll´ar. Focal loss for dense object detection[C]. In ICCV, 2017.》
- A special target detection network RetinaNet is specially designed for experimental verification.
- After the paper experiments RetinaNet loss of function parameters over one pair optimum value : gamma] = 2 \ 2 = Gammac=2、α = 0.25 \ alpha = 0.25a=0.25。
Purpose : To solve the imbalance problem of positive and negative samples in the target detection model .
Method : Through the loss function, the role of the correctly classified samples in the weight update is suppressed. The higher the score of a sample that is correctly classified, the lower its effect .
For example: a positive sample with a score of 0.99 has less effect than a positive sample with a score of 0.70; a negative sample with a score of 0.01 has less effect than a negative sample with a score of 0.40.
1. Focal Loss loss function
First introduced cross-entropy loss function (Cross Entropy) and balanced cross-entropy loss function (Balanced Cross Entropy), these two functions of the proposed Focal Loss loss function was an eye opener.
记yyy is the positive and negative category, 1 is positive and 0 is negative;ppp is the predicted value of the model for the positive category,1 − p 1-p1−p is the predicted value for the negative category.
- Cross entropy loss function
CE (p, y) = {− log (p) if y = 1 − log (1 − p) if y = 0 CE(p,y)=\begin(cases) &-log(p) \ \ \ \ \ \ \ \ \ \ \ if\ y=1 \\ &-log(1-p)\ \ \ \ if\ y=0 \\ \end{cases}C E ( p ,and )={ −log(p) if y=1−log(1−p) if y=0
记
p t = { p i f y = 1 1 − p i f y = 0 p_t=\begin{cases} &p\ \ \ \ \ \ \ \ \ if\ y=1\\ &1-p\ \ if\ y=0\\ \end{cases} pt={ p i f y =11−p i f y =0
则
CE (p, y) = CE (pt) = - log (pt) CE (p, y) = CE (p_t) = - log (p_t)C E ( p ,and )=C E ( pt)=−log(pt)- Balanced cross entropy loss function
CE (pt) = − α tlog (pt) CE(p_t)=-\alpha_t log(p_t)C E ( pt)=- αtlog(pt)
Whereα ∈ [0, 1] \alpha\in [0,1]a∈[0,1],且
α t = { α i f y = 1 1 − α i f y = 0 \alpha_t=\begin{cases} &\alpha\ \ \ \ \ \ \ \ \ if\ y=1\\ &1-\alpha\ \ if\ y=0\\ \end{cases} at={ α if y=11−αify =0
1.1 Focal Loss loss function
F L ( p t ) = − ( 1 − p t ) γ l o g ( p t ) FL(p_t)=-(1-p_t)^\gamma log(p_t) FL(pt)=−(1−pt)γlog(pt)
γ ≥ 0 \ gamma \ ge 0c≥0 is a hyperparameter, which is called Tunable Focusing Parameter. The following is the loss function curve under the control of different values. In the experiment of the paper, γ = 2 \gamma=2c=2 has the best effect.
1.2 Based on α \alphaα 's Focal Loss loss function
This is the actual loss function used.
FL (pt) = − α t (1 − pt) γ log (pt) FL(p_t)=-\alpha_t (1-p_t)^\gamma log(p_t)FL(pt)=- αt(1−pt)γlog(pt)
2. RetinaNet Network
Structure diagram:
The network will not be described in detail.
Finally, the experimental data: