源码解析|Focal loss

源码简介

之前的博文介绍了retinanet的大体思路，然而，如何只是看论文，不看代码，无异于纸上谈兵，很难提升实践能力。下面详细讲解retinanet的源码，github地址为：https://github.com/yhenon/pytorch-retinanet

网络的输出

backbone一共有两种输出，一种是分类子网络，另一种是回归子网络。注意，在这个目标检测的网络中，提取的特征是多尺度的，较大的特征可以保留细节，较小的特征具有较好的全局特性。backbone的具体实现是Resnet+FPN。

1.分类子网络

例如，在调试过程中，分类子网络的输出的形状是[99765,80],其中的99765代表的是待分类的数据的个数（候选框的个数），也就是说，把多尺度的特征融合在一起，再拍平了，一共可以得到99765个不同位置和不同大小的候选框的数据，然后需要对这些数据进行分类，类别数一共为80。注意这个99765不是一个固定的数，具体的结果取决于图像的大小。在许多图像分割以及目标检测任务中，会把背景类作为单独的一类，分类数为81。然而，在retinanet中，是按照80类来处理的，并没有将背景类加进去，这是如何操作的呢，可以分析代码。
分类的子网络的每一条数据长度为80，可以把这80个位置中的每一个数进行一个二分类的任务，即要么第i个位置是前景类别i，要么就是背景类

# 得到一个形状为[99765,80]的全为-1的矩阵，这个-1表示的是一个初始状态位
targets = torch.ones(classification.shape) * -1 
targets = targets.cuda() #放到gpu计算

#只要这个框子的Iou<0.4,状态位就变为0，代表该框子为背景，参与损失计算
targets[torch.lt(IoU_max, 0.4), :] = 0 
#只要这个框子的Iou>0.5，状态位就变为1，代表该框子为前景，参与损失计算
positive_indices = torch.ge(IoU_max, 0.5)
#注意，此时，target矩阵中一共有两种状态位，-1，0
#此时positive_indices为前景的索引

num_positive_anchors = positive_indices.sum()  # 计算出所有的预测出的正例的个数
#调试 assigned_annotations[99765,5] bbox_annotation[2,5],Iou_argmax[99765]，这是与标注框相关联的预测框
assigned_annotations = bbox_annotation[IoU_argmax, :]
#这是对标注的框进行扩展，用来计算损失
targets[positive_indices, :] = 0 #[99765,80]
targets[positive_indices, assigned_annotations[positive_indices, 4].long()] = 1 #这是把对应的类别标签变为1，成为onehot的形式
#此时，target有三种标志位：-1,0,1

alpha_factor = torch.ones(targets.shape).cuda() * alpha  # 这是因子的矩阵[99765,80]
#标志位为1，是前景，赋值为0.25；否则，赋值为0.75
alpha_factor = torch.where(torch.eq(targets, 1.), alpha_factor, 1. - alpha_factor)
'''
alpha 代表a
focal_weight代表（1-pt）
'''
a = 30
g = 2
################################################################
#以下操作相当于公式中的a*(1-pt)^2
focal_weight = torch.where(torch.eq(targets, 1.), 1. - classification, classification)
focal_weight = alpha_factor * torch.pow(focal_weight, gamma)
#####################################################

#以下操作相当于logpt，注意，这是一个二分类的交叉熵损失，不是多分类
bce = -(targets * torch.log(classification) + (1.0 - targets) * torch.log(1.0 - classification))


cls_loss = focal_weight * bce 

#在这里 ，把target中的标志位为-1的部分过滤掉
cls_loss = torch.where(torch.ne(targets, -1.0), cls_loss, torch.zeros(cls_loss.shape).cuda())
# 得到平均损失
classification_losses.append(cls_loss.sum()/torch.clamp(num_positive_anchors.float(), min=1.0))

2.回归子网络

分类网络与回归网络的候选框的位置与其类别是对应的，也就是说，回归子网络中也有99765个数据，每个数据的长度为4，代表自己的位置。

   if positive_indices.sum() > 0:
        assigned_annotations = assigned_annotations[positive_indices, :]
        #Anchor中的值，这些Anchor是由规则生成的
        anchor_widths_pi = anchor_widths[positive_indices]
        anchor_heights_pi = anchor_heights[positive_indices]
        anchor_ctr_x_pi = anchor_ctr_x[positive_indices]
        anchor_ctr_y_pi = anchor_ctr_y[positive_indices]
        # 实际标注的数据，4个值
        gt_widths  = assigned_annotations[:, 2] - assigned_annotations[:, 0]
        gt_heights = assigned_annotations[:, 3] - assigned_annotations[:, 1]
        gt_ctr_x   = assigned_annotations[:, 0] + 0.5 * gt_widths
        gt_ctr_y   = assigned_annotations[:, 1] + 0.5 * gt_heights

        # 保证最小值大于等于1像素值
        gt_widths  = torch.clamp(gt_widths, min=1)
        gt_heights = torch.clamp(gt_heights, min=1)
        #计算出4个值得偏移（规则生成的Anchor和实际的偏移）
        #回归网络的任务就是回归的偏移量，而不是直接得到数据
        targets_dx = (gt_ctr_x - anchor_ctr_x_pi) / anchor_widths_pi
        targets_dy = (gt_ctr_y - anchor_ctr_y_pi) / anchor_heights_pi
        targets_dw = torch.log(gt_widths / anchor_widths_pi)
        targets_dh = torch.log(gt_heights / anchor_heights_pi)
        #把它们压在一起，转置
        targets = torch.stack((targets_dx, targets_dy, targets_dw, targets_dh))
        targets = targets.t()
        #这是一种正则化的操作， 推断的时候也需要这样进行反操作
        targets = targets/torch.Tensor([[0.1, 0.1, 0.2, 0.2]]).cuda()


        negative_indices = 1 - positive_indices
#预测的偏移量和实际的偏移量的差距，得到损失
        regression_diff = torch.abs(targets - regression[positive_indices, :])
#下面的分段函数既连续，又可导，用来计算损失。
        regression_loss = torch.where(
            torch.le(regression_diff, 1.0 / 9.0), #
            0.5 * 9.0 * torch.pow(regression_diff, 2), # <= 1/9
            regression_diff - 0.5 / 9.0
        )
        regression_losses.append(regression_loss.mean())
    else:
        regression_losses.append(torch.tensor(0).float().cuda())

总结

retinanet是一阶段的检测器，上文从分类和回归两个子网络的角度分析了网络中损失函数的计算过程。在分析深度学习的程序过程中，需要注意张量的形状变化，这样有助于对程序目的的理解。本人在一开始阅读代码的时候也是困难重重，不理解作者的目的，但是，通过调试可以使得思路逐渐清晰，并在这个过程中熟悉了许多张量的基本操作。以上便是本人对于retinanet源码的解析，希望可以可以帮助到同样在研究retinanet的同学。