Simple understanding of weighted BCE Loss and weighted IoU Loss in small target segmentation

These two loss functions are from the article "F³Net: Fusion, Feedback and Focus for Salient Object Detection", which are used to deal with the problem of small . For the traditional BCE Loss, there are the following three problems:

  • It simply averages the BCE of each pixel, ignoring the structure of the target object
  • For small targets, the loss of the entire image will be dominated by the background class, making it difficult to learn the foreground
  • Edge loxels of objects are very prone to misclassification and should not be given similar weights as other loxels

The solution, of course, is to weight pixels at different locations. Specifically, the place with the highest weight should be the edge position of the place, and the farther away from the edge, the lower it is. So how to implement this weighting process in a "non-simple and rude" way? The article weighted BCE Loss expression is as follows: L wbces = − ∑ i = 1 H ∑ j = 1 W ( 1 + γ α ij ) ∑ l = 0 1 1 ( gijs = l ) log ⁡ P r ( pijs = l ∣ Ψ ) ∑ i = 1 H ∑ j = 1 W γ α ij L_{wbce}^s=-\frac{\sum_{i=1}^H \sum_{j=1}^W\left(1+\gamma \alpha_{ij}\right) \sum_{l=0}^1 \mathbf{1}\left(g_{ij}^s=l\right) \log \mathbf{P r}\left(p_{ij }^s=l \mid \Psi\right)}{\sum_{i=1}^H \sum_{j=1}^W \gamma \alpha_{ij}}Lwbces=i=1Hj=1Wc aiji=1Hj=1W(1+c aij)l=011(gijs=l)logPr(pijs=lPs ).Here α ij \alpha_{ij}aijRefers to the weight of the pixel at (i, j) position. If not weighted, it is equivalent to α ij \alpha_{ij}aijConstantly 1.

Regardless of the other parts of the formula, focus on α ij \alpha_{ij}aij是如何计算的: α i j s = ∣ ∑ m , n ∈ A i j g m n s ∑ m , n ∈ A i j 1 − g i j s ∣ \alpha_{i j}^s=\left|\frac{\sum_{m, n \in A_{i j}} g_{m n}^s}{\sum_{m, n \in A_{i j}} 1}-g_{i j}^s\right| aijs= m,nAij1m,nAijgmnsgijs Among them gijs g_{ij}^sgijsRepresents the true value of (i, j) position (1 or 0, corresponding to foreground or background), A ij A_{ij}AijIndicates the pixels around (i, j).

We take a few special values ​​to discuss. Suppose gmns g_{mn}^sgmnsBoth are 0, gijs g_{ij}^sgijsIt is 1, which means that the current pixel is the foreground, and the surrounding pixels are the background, which is a small target and should be given a high weight. Similarly, if gmns g_{mn}^sgmnsBoth are 0, gijs g_{ij}^sgijsIt is also 0, indicating that the current location and the surrounding locations are both backgrounds, which corresponds to a low weight. The weights are visualized as follows: as
insert image description here
you can see, the weights near the edge are overweight (red), while pixels far from the edge are zeroed. To some extent, it can be understood that this is a boundary-aware method that does not require explicit input of edge information.

Next look at weighted IoU Loss. It should be noted that the concept of IoU is naturally suitable for dealing with small targets, so the weighting of IoU Loss is purely to unify the concept, and its expression is as follows: L wiou s = 1 − ∑ i = 1 H ∑ j = 1 W ( gtijs ∗ pijs ) ∗ ( 1 + γ α ijs ) ∑ i = 1 H ∑ j = 1 W ( gtijs + pijs − gtijs ∗ pijs ) ∗ ( 1 + γ α ijs ) L_{\text {wiou }}^s=1- \frac{\sum_{i=1}^H \sum_{j=1}^W\left(g t_{ij}^s * p_{ij}^s\right) *\left(1+\gamma \ alpha_{ij}^s\right)}{\sum_{i=1}^H \sum_{j=1}^W\left(g t_{ij}^s+p_{ij}^sg t_{ij} ^s * p_{ij}^s\right) *\left(1+\gamma \alpha_{ij}^s\right)}Lwiou s=1i=1Hj=1W(gtijs+pijsgtijspijs)(1+c aijs)i=1Hj=1W(gtijspijs)(1+c aijs)What needs special attention here is that ( 1 + γ α ijs ) \left(1+\gamma \alpha_{ij}^s\right) in the numerator and denominator(1+c aijs) cannot be directly reduced, because it is not a constant, but a value that will change as (i, j) changes. The idea of ​​the above formula is still that the pixels closer to the edge contribute more to the IOU calculation.

The code is implemented as follows:

def structure_loss(pred, mask):
    weit = 1 + 5*torch.abs(F.avg_pool2d(mask, kernel_size=31, stride=1, padding=15) - mask)
    wbce = F.binary_cross_entropy_with_logits(pred, mask, reduce='none')
    wbce = (weit*wbce).sum(dim=(2, 3)) / weit.sum(dim=(2, 3))

    pred = torch.sigmoid(pred)
    inter = ((pred * mask)*weit).sum(dim=(2, 3))
    union = ((pred + mask)*weit).sum(dim=(2, 3))
    wiou = 1 - (inter + 1)/(union - inter+1)
    return (wbce + wiou).mean()

It should be noted that the pred input here (that is, the output of the network) does not need to be processed by sigmoid in advance, because sigmoid has already been performed in the function. When using it, it should prevent the situation of repeating the sigmoid twice.

The loss function involves two hyperparameters, one is α ij \alpha_{ij}aijThe weighted value of γ \gammac .c \gammaγ is 0, which means no weighting;γ \gammaThe larger the γ , the more the network depends onα ij \alpha_{ij}aijTo weight different positions, the weight difference of different pixel positions will be greater. In the code γ = 5 \gamma=5c=5

The other is A ij A_{ij}AijThe specific definition of "surrounding pixels" in the code uses F.avg_pool2d for average pooling.

Guess you like

Origin blog.csdn.net/qq_40714949/article/details/128998685