nanodet-plus reading: (2) Definition of positive and negative samples (SimOTA)

I. Introduction

Introduce a few well-written blogs about SimOTA. In fact, they have been written very well, but I still wrote this blog like a dog's tail, mainly for the convenience of my own review, after all, people are always more accustomed to what they write. In addition, paste the nanodet-plus simOTA code comments I wrote and the blogs I think are good:
1. Dynamic soft label allocation ;
2. A complete explanation of the core foundation of Yolox ;
3. Interpretation of YOLOX (OTA / SimOTA) ;

Two, the text

First, describe SimOTAthe process in words, and I will try my best to explain clearly the questions I have when I look at the code.

  1. Determine whether the upper left corner of each prior(actually YOLOa series cell/grid, similar in effect ) is inside ( ). Some upper left corners may be in more than one place, it doesn't matter, this step only pays attention to whether the upper left corner falls in it. Finally output a result record array—— ;anchorgtpriorgtpriorgtvalid_mask

    Note: priorThe number of is bboxthe number of predictions, so screening prioris actually screening predictions bbox. Because nanodetthe series is anchor-freea model, each predicted coordinate is calculated bboxfrom the corresponding top-left coordinate.prior

prior_center = priors[:, :2]  # image 坐标系下,cell 左上角坐标, shape = (num_priors, 2)
# 如果结果全为正,说明 cell 的左上角在 gt 里面
lt_ = prior_center[:, None] - gt_bboxes[:, :2]  # 所有cell左上角 与 gt左上角 的差值, shape = (num_priors, num_gt, 2)
rb_ = gt_bboxes[:, 2:] - prior_center[:, None]  # gt右下角 与 所有cell左上角 的差值, shape = (num_priors, num_gt, 2)

deltas = torch.cat([lt_, rb_], dim=-1)  # (num_priors, num_gt, 4),坐标差值 (delta_x1, delta_y1, delta_x2. delta_y2)
# 判断 每个cell左上角 是否在 gt 里面。 先挑出4个坐标差值的最小值,再看其是否大于0,如果是置为 True.
is_in_gts = deltas.min(dim=-1).values > 0  # shape = (num_priors, num_gt)
# 如果 cell的左上角 至少在一个 gt 里面,则为 True ,否则为 False.
valid_mask = is_in_gts.sum(dim=1) > 0  # shape = (num_priors, ),得到 每个cell的左上角 与 所有gt 的关系数组
  1. valid_maskPreliminary screening of partial predictions and confidence levels with arrays bboxto obtain the preliminary screening prediction results—— valid_decoded_bbox, valid_pred_scores;
valid_decoded_bbox = decoded_bboxes[valid_mask]  # 筛选,shape = (num_valid, 4)
valid_pred_scores = pred_scores[valid_mask]  # 筛选, shape = (num_valid, num_classes)
num_valid = valid_decoded_bbox.size(0)  # 得到符合条件的 bbox 个数
  1. Calculate the predicted results after preliminary screening bboxand the gtmatrix cost( cost_matrix)—— iou_cost + cls_cost. The idea here cls lossis still used , that is nanodet, gflthe label of the classification loss is the predicted iouvalue. See my previous blog for details , so I won’t go into details here.
# shape = (num_valid, num_gt). IOU越大,匹配效果越好,我们需要 IOU 大的 bbox结果。
pairwise_ious = bbox_overlaps(valid_decoded_bbox, gt_bboxes)  # 计算符合条件 bbox 与 gt 的 IOU值
# 转为 IOU 损失,IOU越大(靠近1),损失越小
iou_cost = -torch.log(pairwise_ious + 1e-7)

gt_onehot_label = (
    F.one_hot(gt_labels.to(torch.int64), pred_scores.shape[-1])  # shape = (num_gts, num_classes)
    .float()
    .unsqueeze(0)  # shape = (1, num_gts, num_classes)
    .repeat(num_valid, 1, 1)  # shape = (num_valid, num_gts, num_classes)
)
# shape 变为 (num_valid, num_gt, num_classes)
valid_pred_scores = valid_pred_scores.unsqueeze(1).repeat(1, num_gt, 1)
# 沿用了 gfl 的思路,用 IOU 值做分类的 label
soft_label = gt_onehot_label * pairwise_ious[..., None]
scale_factor = soft_label - valid_pred_scores
# 还是 gfl 的思路
cls_cost = F.binary_cross_entropy(
    valid_pred_scores, soft_label, reduction="none"
) * scale_factor.abs().pow(2.0)

cls_cost = cls_cost.sum(dim=-1)
# shape = (num_valid, num_gt)。这个cost数组是分类损失与bbox损失的综合损失。 self.iou_factor = 3
cost_matrix = cls_cost + iou_cost * self.iou_factor  # IOU更重视,毕竟当前是标签分配阶段,IOU越大,标签与bbox越匹配
  1. Sort pairwise_iousthe matrix (preliminarily screened predictions bboxand each gtvalue iou) by column topk( ), and output the first values ​​of each gtand all candidates in descending order. Then sum by column, and then directly round the result of the sum to get the number of pairs that can be paired (the limit is at least one), that is, the dynamic value ( ). After obtaining each one , it is necessary to perform a sorting process ( ) on the predicted results and the matrix ( in the code below ) after the preliminary screening , and select each one to be matched .bboxtopkiougtbbox1kdynamic_k
    gtdynamic_kbboxgtcostcost_matrixcosttopkgtdynamic_kbbox
 # select candidate topk ious for dynamic-k calculation
 candidate_topk = min(self.topk, pairwise_ious.size(0))  # 两个数之间选个最小值,免得报错
 # 降序输出 每个gt与所有候选bbox的 前topk 个 IOU值。 shape = (candidate_topk, num_gt)
 topk_ious, _ = torch.topk(pairwise_ious, candidate_topk, dim=0)
 # calculate dynamic k for each gt. 先得到每个 gt 的前topk个IOU值之和,再取整,最后做截断。得到每个gt IOU之和的整数部分
 # shape = (num_gt, ) 这个数组的每个元素是对应gt可以与几个bbox做匹配,最小值为1是因为gt肯定至少有一个bbox与之匹配
 dynamic_ks = torch.clamp(topk_ious.sum(0).int(), min=1)
 for gt_idx in range(num_gt):
     _, pos_idx = torch.topk(
         cost[:, gt_idx], k=dynamic_ks[gt_idx].item(), largest=False
     )  # 升序 动态K,选出损失最小的前 dynamic_k 个 bbox
     matching_matrix[:, gt_idx][pos_idx] = 1.0  # gt 与哪个bbox匹配,元素值置为1

The figure below is pairwise_iousa schematic diagram of column processing. The subsequent costprocessing of matrices is also similar.
insert image description here

  1. The following operation is deduplication, because each gtcan bboxmatch with multiple at the same time, and each bboxcan only match with one at the same time gt.
prior_match_gt_mask = matching_matrix.sum(1) > 1  # 大于 1 说明存在某些 bbox 会与多个 gt 匹配。
if prior_match_gt_mask.sum() > 0:  # 判断是否有 bbox 匹配到 多个gt 的情况
    # 下面几行的作用是 去除匹配多个 gt 的 bbox 情况,每个 bbox 只匹配一个 gt
    cost_min, cost_argmin = torch.min(cost[prior_match_gt_mask, :], dim=1)  # 选择损失最小的那个 gt 与 bbox 做匹配
    matching_matrix[prior_match_gt_mask, :] *= 0.0
    matching_matrix[prior_match_gt_mask, cost_argmin] = 1.0  # 除损失最小的gt外,其他都置为 0

The specific operation is to costoutput the smallest one gt( ) of the matrix row by row, which gtis the current one bboxto be matched gt, thus avoiding the situation where one bboxmatch is multiple gt.
insert image description here
It is worth noting that although priorthe frequency of occurrence in the above steps is very low, it is more often bboxthe shadow. But I still have to say that the processing object of the positive and negative sample definition algorithm is priornotbbox . It is only used priorif bboxthe number and position of the number and position are consistent, and it bboxis more convenient in the above calculation bbox. The results output above will eventually be used on priorthe Internet.

Finally, to sum up, a total of several screenings have been done (in fact, I have already marked it above), and what is the method:
Determine priorwhether the upper left corner is gtinside;
Sort pairwise_iousthe matrix by column topkto get the gtdynamic kvalue of each ( dynamic_k) ;
③Sortcost the matrix by column topkto select each one gtto be matched ; ④Output the smallest one by row in the matrix , and that is the one to be matched .dynamic_kbbox
costgtgtbboxgt

3. Questions and personal understanding

  1. Calculation method of dynamic k( ); how it came from is mentioned above, so I won’t go into details here. First of all , it is correct to evaluate the degree of matching with and , but the author sums up the previous values ​​and rounds them up, which makes me very puzzled. Just imagine, if you first filter the value and a threshold, and then count the number of eligible ones, it sounds like it . But this will refer to a priori parameter- threshold, and prior knowledge is what the author wants to avoid, so the author did not choose this method. The author adds the values ​​directly to obtain the overall situation of the values. The better the overall matching effect, the greater the sum of the values, and finally the greater, and vice versa.dynamic_k
    dynamic_kioubboxgttopkiou
    iouioubboxyolo v3iou
    iouiouioudynamic_k
  2. valid_maskThe assignment update;
    this will be more clear after reading the code comments, mainly because valid_maskthe array dynamic_k_matching()has been assigned and updated in the function, but the function does not output a new one when outputting valid_mask. At that time, it was guessed that the memory address had not changed, so there was no need to output it directly. After checking the blog about the memory mechanism of python objects , it was confirmed that this is the reason.
    To sum up, just reassigning some elements in the array will not create a new object. But if the array has calculations such as addition, subtraction, multiplication and division, a new object will be created.
    valid_mask[valid_mask.clone()] = fg_mask_inboxes  #此处的赋值不会新建一块内存,所以此处的 valid_mask 与 114 行的一样
    
  3. Why priorshould the upper left corner be gtinside; I mentioned it in
    the previous nanodetblognanodet-plus , because it is to learn the distance priorfrom the upper left corner to gtthe four boundaries, and I hope that the four distances learned are all positive numbers. If priorthe upper left corner is gtoutside, the model will not be able to learn one. Negative numbers pull it back. Second, priorthe upper left corner is gtinside, meaning there are more features to learn.

Guess you like

Origin blog.csdn.net/tangshopping/article/details/128002292