1 综述

无论是anchor box 还是anchor free，在训练计算类别/前背景损失时都需用到正负样本匹配，目前分为两大类：
第一类 fixed label assignment，常用的主要有MaxIou、ATSS、focos
第二类 dyanmic label assignment，常用的主要有simOTA、TaskAlign。
匹配原则：同一个anchor 只能被分配给某一个GT，但是一个GT可以被分配多个anchor

2 Maxiou匹配策略

Maxiou主要通过计算anchor box与GT之间的iou，并分别设定正负样本的阈值，完成正负样本的匹配，常见的运用包括Faster-RCNN、Mask-RCNN、SSD、YOLOv3等。

2.1 Maxiou 匹配步骤

1. 计算每个GT和所有anchor box的iou
2. 对于每个anchor box，找到与它最匹配的GT对应的最大iou
3. 若该最大iou < 负样本阈值，那么该anchor box 为负样本
4. 若该最大iou > 正样本阈值，那么该anchor box 为正样本
5. 若采用强制正样本（self.match_low_quality）：对于每个GT，找到与其iou最大（最匹配）的anchor，该anchor即为该GT的正样本。
6. 步骤5 存在的问题：若anchor A 与 GT 1 的iou为0.9，与GT 2的iou为0.8，那么依据步骤2和4，anchor A会被匹配给与它iou最大的GT1，但是假如与GT 2 iou最大的anchor就是anchor A，那么依据步骤5，anchor A会被重新分配给GT2。此时可能存在GT1没有被分配到anchor的情况，同时GT1匹配的anchor质量也下降了。

def assign_wrt_overlaps(self, overlaps, gt_labels=None):

        num_gts, num_bboxes = overlaps.size(0), overlaps.size(1)
        # 1. assign -1 by default
        assigned_gt_inds = overlaps.new_full((num_bboxes, ),-1, dtype=torch.long)
        # for each anchor, the max iou of all gts,找出每个anchor最匹配的GT
        max_overlaps, argmax_overlaps = overlaps.max(dim=0)
        # for each gt, the max iou of all proposals，找出每个GT最匹配的anchor
        gt_max_overlaps, gt_argmax_overlaps = overlaps.max(dim=1)

        # 2. assign negative: below
        # the negative inds are set to be 0
        assigned_gt_inds[(max_overlaps >= 0) & (max_overlaps < self.neg_iou_thr)] = 0
        
        # 3. assign positive: above positive IoU threshold
        pos_inds = max_overlaps >= self.pos_iou_thr
        assigned_gt_inds[pos_inds] = argmax_overlaps[pos_inds] + 1

        if self.match_low_quality:
            # Low-quality matching will overwrite the assigned_gt_inds assigned
            # in Step 3. Thus, the assigned gt might not be the best one for
            # prediction.
            # For example, if bbox A has 0.9 and 0.8 iou with GT bbox 1 & 2,
            # bbox 1 will be assigned as the best target for bbox A in step 3.
            # However, if GT bbox 2's gt_argmax_overlaps = A, bbox A's
            # assigned_gt_inds will be overwritten to be bbox 2.
            # This might be the reason that it is not used in ROI Heads.
            for i in range(num_gts):
                if gt_max_overlaps[i] >= self.min_pos_iou:
                    if self.gt_max_assign_all:   # 若与该gt最近的bbox有多个时,是否对所有argmax bbox赋值
                        max_iou_inds = overlaps[i, :] == gt_max_overlaps[i]
                        assigned_gt_inds[max_iou_inds] = i + 1
                    else:
                        assigned_gt_inds[gt_argmax_overlaps[i]] = i + 1   # 只对一个argmax bbox赋值

        if gt_labels is not None:
            assigned_labels = assigned_gt_inds.new_full((num_bboxes, ), -1)
            pos_inds = torch.nonzero(
                assigned_gt_inds > 0, as_tuple=False).squeeze()
            if pos_inds.numel() > 0:
                assigned_labels[pos_inds] = gt_labels[
                    assigned_gt_inds[pos_inds] - 1]
        else:
            assigned_labels = None

        return AssignResult(num_gts, assigned_gt_inds, max_overlaps, labels=assigned_labels)

2 ATSS匹配策略—CVPR 2020

2.1 ATSS 匹配步骤

计算所有anchor_boxes 与 gt_bboxes之间的中心点距离。
遍历5个输出层，对于每个gt_bbox，都选择topk=9个L2距离最小的anchor作为候选框，此步骤完成后每个gt_bbox，一共挑选出 9x5=45 个候选 anchor。
计算这45个候选框与其对应GT的iou，并求均值和标准差的和作为正样本筛选阈值。
找出与Gt的iou大于筛选阈值的候选框。
判断候选框的中心是否在GT内部，满足候选框中心在Gt内部且与Gt的iou大于筛选阈值的为正样本。
若一个候选框同时匹配了多个GT，则将其匹配给与它iou最大的那个GT

def assign(self,
           bboxes,
           num_level_bboxes,
           gt_bboxes,
           gt_bboxes_ignore=None,
           gt_labels=None,
           cls_scores=None,
           bbox_preds=None):

    INF = 100000000
    bboxes = bboxes[:, :4]
    num_gt, num_bboxes = gt_bboxes.size(0), bboxes.size(0)

    overlaps = self.iou_calculator(bboxes, gt_bboxes)
    # assign 0 by default
    assigned_gt_inds = overlaps.new_full((num_bboxes,), 0, dtype=torch.long)

    # 1. 计算所有anchor_boxes 与 gt_bboxes之间的中心点距离
    gt_cx = (gt_bboxes[:, 0] + gt_bboxes[:, 2]) / 2.0
    gt_cy = (gt_bboxes[:, 1] + gt_bboxes[:, 3]) / 2.0
    gt_points = torch.stack((gt_cx, gt_cy), dim=1)  # [gt_nums, 2]

    bboxes_cx = (bboxes[:, 0] + bboxes[:, 2]) / 2.0
    bboxes_cy = (bboxes[:, 1] + bboxes[:, 3]) / 2.0
    bboxes_points = torch.stack((bboxes_cx, bboxes_cy), dim=1)  # [anchor_nums, 2]
    distances = (bboxes_points[:, None, :] - gt_points[None, :, :]).pow(2).sum(-1).sqrt()  # [anchor_nums, gt_nums]

    # 2. 遍历5个输出层，对于每个gt_bbox，都选择topk=9个L2距离最小的anchor作为候选框，此步骤完成后每个gt_bbox，一共挑选出 9x5=45 个候选 anchor
    candidate_idxs = []
    start_idx = 0
    for level, bboxes_per_level in enumerate(num_level_bboxes):
        # on each pyramid level, for each gt,
        # select k bbox whose center are closest to the gt center
        end_idx = start_idx + bboxes_per_level
        distances_per_level = distances[start_idx:end_idx, :]
        selectable_k = min(self.topk, bboxes_per_level)

        _, topk_idxs_per_level = distances_per_level.topk(selectable_k, dim=0, largest=False)
        candidate_idxs.append(topk_idxs_per_level + start_idx)
        start_idx = end_idx
    candidate_idxs = torch.cat(candidate_idxs, dim=0)

    # 3. 计算这45个候选框与其对应GT的iou，并求均值和标准差的和作为正样本筛选阈值
    candidate_overlaps = overlaps[candidate_idxs, torch.arange(num_gt)]  # [45, gt_nums]
    overlaps_mean_per_gt = candidate_overlaps.mean(0)  # [gt_nums]
    overlaps_std_per_gt = candidate_overlaps.std(0)  # [gt_nums]
    overlaps_thr_per_gt = overlaps_mean_per_gt + overlaps_std_per_gt
    # 4. 找出与Gt的iou大于筛选阈值的候选框
    is_pos = candidate_overlaps >= overlaps_thr_per_gt[None, :]

    # 5. 判断候选框的中心是否在GT内部，满足候选框中心在Gt内部且与Gt的iou大于筛选阈值的为正样本
    for gt_idx in range(num_gt):
        candidate_idxs[:, gt_idx] += gt_idx * num_bboxes
    ep_bboxes_cx = bboxes_cx.view(1, -1).expand(num_gt, num_bboxes).contiguous().view(-1)
    ep_bboxes_cy = bboxes_cy.view(1, -1).expand(num_gt, num_bboxes).contiguous().view(-1)
    candidate_idxs = candidate_idxs.view(-1)

    # 候选框中心到Gt四条边的距离
    # bbox center and gt side
    l_ = ep_bboxes_cx[candidate_idxs].view(-1, num_gt) - gt_bboxes[:, 0]
    t_ = ep_bboxes_cy[candidate_idxs].view(-1, num_gt) - gt_bboxes[:, 1]
    r_ = gt_bboxes[:, 2] - ep_bboxes_cx[candidate_idxs].view(-1, num_gt)
    b_ = gt_bboxes[:, 3] - ep_bboxes_cy[candidate_idxs].view(-1, num_gt)
    is_in_gts = torch.stack([l_, t_, r_, b_], dim=1).min(dim=1)[0] > 0.01

    is_pos = is_pos & is_in_gts

    # 6. 若一个候选框同时匹配了多个GT，则将其匹配给与它iou最大的那个GT
    overlaps_inf = torch.full_like(overlaps, -INF).t().contiguous().view(-1)
    index = candidate_idxs.view(-1)[is_pos.view(-1)]
    overlaps_inf[index] = overlaps.t().contiguous().view(-1)[index]
    overlaps_inf = overlaps_inf.view(num_gt, -1).t()

    max_overlaps, argmax_overlaps = overlaps_inf.max(dim=1)
    assigned_gt_inds[max_overlaps != -INF] = argmax_overlaps[max_overlaps != -INF] + 1

    if gt_labels is not None:
        assigned_labels = assigned_gt_inds.new_full((num_bboxes,), -1)
        pos_inds = torch.nonzero(assigned_gt_inds > 0, as_tuple=False).squeeze()
        if pos_inds.numel() > 0:
            assigned_labels[pos_inds] = gt_labels[assigned_gt_inds[pos_inds] - 1]
    else:
        assigned_labels = None
    return AssignResult(num_gt, assigned_gt_inds, max_overlaps, labels=assigned_labels)

2.2 ATSS优点和缺点

优点：
1. 考虑到了GT中心点对样本匹配的影响。
在RetinaNet中，anchor box与GT中心点越近一般IoU越高，而在FCOS中，中心点越近一般预测的质量越高。
2. 正负样本阈值非人工确定，而是根据候选框与GT框的iou的均值+标准差动态确定。
均值反映了预设的anchor与GT的匹配程度，均值高则应当提高阈值来调整正样本，均值低则应当降低阈值来调整正样本。标准差则反映了适合GT的特征层，标准差高则表示高质量的anchor box集中在一个层中，应将阈值加上标准差来过滤其他层的anchor box，低则表示多个层都适合该GT。均值和标准差结合作为IoU阈值能够很好地自动选择对应的特征层上合适的anchor box
3. 限制anchor的中心在GT区域内
若anchor box的中心点不在GT区域内，则其会使用非GT区域的特征进行预测，这不利于训练，应该排除。
4. 不同大小、不同长宽比的GT分配的anchor数量均衡。
5. 仅有一个超参数K且影响较小，接近于没有超参。

缺点：
假设候选框的质量都不行，（均值很低）也会强行进行正样本匹配，因此容易带来误检问题。

在这里插入图片描述

【目标检测】---- 正负样本匹配策略