Various loss accumulations used in deep learning target detection (continuously updated)

Various loss accumulations used in deep learning target detection

1. binary_cross_entropy

Binary cross entropy is commonly used with binary classification problems. For example, when the two-stage target detection function proposes proposals in the first stage of RPN, it needs to classify positive and negative samples. The loss for classification is Binary cross entropy. Its formula is as follows:
 Loss = − 1 N ∑ i = 1 N yi ⋅ log ⁡ ( p ( yi ) ) + ( 1 − yi ) ⋅ log ⁡ ( 1 − p ( yi ) ) \text { Loss }=-\frac {1}{\mathrm{~N}} \sum_{\mathrm{i}=1}^{\mathrm{N}} \mathrm{y}_{\mathrm{i}} \cdot \log \left( \mathrm{p}\left(\mathrm{y}_{\mathrm{i}}\right)\right)+\left(1-\mathrm{y}_{\mathrm{i}}\right) \ cdot \log \left(1-\mathrm{p}\left(\mathrm{y}_{\mathrm{i}}\right)\right) Loss = N1i=1Nyilog(p(yi))+(1yi)log(1p(yi) )
The formula is easy to understand, specific theoretical reference:Quickly understand binary cross entropy binary cross entropy

def binary_cross_entropy(pred,  #我们预测的为正样本的分数
                         label, #真实的标签
                         weight=None,
                         reduction='mean',
                         avg_factor=None,
                         class_weight=None,
                         ignore_index=-100,
                         avg_non_ignore=False):

    # The default value of ignore_index is the same as F.cross_entropy
    ignore_index = -100 if ignore_index is None else ignore_index

    if pred.dim() != label.dim():  #这里判断是否为二进制损失
        label, weight, valid_mask = _expand_onehot_labels(   #将label的维度扩展成与pred一样
            label, weight, pred.size(-1), ignore_index)
    else:
        # should mask out the ignored elements
        valid_mask = ((label >= 0) & (label != ignore_index)).float()
        if weight is not None:
            # The inplace writing method will have a mismatched broadcast
            # shape error if the weight and valid_mask dimensions
            # are inconsistent such as (B,N,1) and (B,N,C).
            weight = weight * valid_mask
        else:
            weight = valid_mask

    # average loss over non-ignored elements
    if (avg_factor is None) and avg_non_ignore and reduction == 'mean':
        avg_factor = valid_mask.sum().item()

    # weighted element-wise losses
    weight = weight.float()
    loss = F.binary_cross_entropy_with_logits(
        pred, label.float(), pos_weight=class_weight, reduction='none')
    # do the reduction for the weighted loss
    loss = weight_reduce_loss(
        loss, weight, reduction=reduction, avg_factor=avg_factor)

    return loss

#二进制交叉熵损失特有的函数
def _expand_onehot_labels(labels, label_weights, label_channels, ignore_index):
    """Expand onehot labels to match the size of prediction."""
    bin_labels = labels.new_full((labels.size(0), label_channels), 0)    #[196608, 1]
    valid_mask = (labels >= 0) & (labels != ignore_index)                #[196608]都是true
    inds = torch.nonzero(                   #非0元素的索引,即不仅为正样本,并且有效的元素的索引
        valid_mask & (labels < label_channels), as_tuple=False)

    if inds.numel() > 0:
        bin_labels[inds, labels[inds]] = 1                               #正样本为1,其余为0

    valid_mask = valid_mask.view(-1, 1).expand(labels.size(0),           #[196608, 1]都是true
                                               label_channels).float()
    if label_weights is None:
        bin_label_weights = valid_mask
    else:
        bin_label_weights = label_weights.view(-1, 1).repeat(1, label_channels) #正负样本都为1,其余为0
        bin_label_weights *= valid_mask

    return bin_labels, bin_label_weights, valid_mask

The F.binary_cross_entropy_with_logits function is used here, which automatically adds the sigmoid operation, that is, the Sigmoid layer and BCELoss are combined in one class. Numerically more stable than using a simple Sigmoid and a BCELoss, numerical stability is achieved by combining the operations into one layer and utilizing the log-sum-exp trick.

参数reduction的作用:
ℓ ( x , y ) = { mean ⁡ ( L ) ,  if reduction  =  ‘mean’;  sum ⁡ ( L ) ,  if reduction  =  ‘sum’  \ell(x, y)=\left\{\begin{array}{ll} \operatorname{mean}(L), & \text { if reduction }=\text { ‘mean'; } \\ \operatorname{sum}(L), & \text { if reduction }= \text { ‘sum' } \end{array}\right. (x,y)={ mean(L),I am ( L ) , if reduction = ‘mean’;  if reduction = ‘sum’ 

2. smooth_l1_loss

For the border prediction regression problem, the square loss function (L2 loss) can usually be selected, but the disadvantage of the L2 norm is that when there are outliers (outliers), these points will account for the main component of the loss. For example, the real value is 1, and the prediction is 10 times. One prediction value is 1000, and the remaining prediction value is about 1. Obviously, the loss value is mainly dominated by 1000. So FastRCNN uses a slightly lighter absolute loss function (smooth L1 loss), which grows linearly with the error instead of quadratically.

Note: The difference between smooth L1 and L1-loss functions is that the derivative of L1-loss at point 0 is not unique, which may affect convergence. The solution to smooth L1 is to use a square function around 0 to make it smoother.

  • L2 loss:
    L 2 = ∣ f ( x ) − Y ∣ 2 L 2 ′ = 2 f ′ ( x ) ( f ( x ) − Y ) \begin{array}{l} L_{2}=|f(x)-Y|^{2} \\ L_{2}^{\prime}=2 f^{\prime}(x)(f(x)-Y) \end{array} L2=f(x)Y2L2=2 f(x)(f(x)Y)
  • L1 loss:
    L 1 = ∣ f ( x ) − Y ∣ L 1 ′ = ± f ′ ( x ) \begin{array}{l} L_{1}=|f(x)-Y| \\ L_{1}^{\prime}= \pm f^{\prime}(x) \end{array} L1=f(x)YL1=±f(x)
  • Smooth L1 loss
    l n = { 0.5 ( x n − y n ) 2 /  beta,   if  ∣ x n − y n ∣ <  beta  ∣ x n − y n ∣ − 0.5 ∗  beta,   otherwise  l_{n}=\left\{\begin{array}{ll} 0.5\left(x_{n}-y_{n}\right)^{2} / \text { beta, } & \text { if }\left|x_{n}-y_{n}\right|<\text { beta } \\ \left|x_{n}-y_{n}\right|-0.5 * \text { beta, } & \text { otherwise } \end{array}\right. ln={ 0.5(xnyn)2/ beta, xnyn0.5 beta,  if xnyn< beta  otherwise 

loss curve:
insert image description here

def smooth_l1_loss(pred, target, beta=1.0):
    """Smooth L1 loss.

    Args:
        pred (torch.Tensor): The prediction.
        target (torch.Tensor): The learning target of the prediction.
        beta (float, optional): The threshold in the piecewise function.
            Defaults to 1.0.

    Returns:
        torch.Tensor: Calculated loss
    """
    assert beta > 0
    if target.numel() == 0:
        return pred.sum() * 0

    assert pred.size() == target.size()
    diff = torch.abs(pred - target)
    loss = torch.where(diff < beta, 0.5 * diff * diff / beta,
                       diff - 0.5 * beta)
    return loss

Guess you like

Origin blog.csdn.net/weixin_45453121/article/details/132122037