I. Introduction
Introduce a few well-written blogs about SimOTA
. In fact, they have been written very well, but I still wrote this blog like a dog's tail, mainly for the convenience of my own review, after all, people are always more accustomed to what they write. In addition, paste the nanodet-plus simOTA code comments I wrote and the blogs I think are good:
1. Dynamic soft label allocation ;
2. A complete explanation of the core foundation of Yolox ;
3. Interpretation of YOLOX (OTA / SimOTA) ;
Two, the text
First, describe SimOTA
the process in words, and I will try my best to explain clearly the questions I have when I look at the code.
-
Determine whether the upper left corner of each
prior
(actuallyYOLO
a seriescell/grid
, similar in effect ) is inside ( ① ). Some upper left corners may be in more than one place, it doesn't matter, this step only pays attention to whether the upper left corner falls in it. Finally output a result record array—— ;anchor
gt
prior
gt
prior
gt
valid_mask
Note:
prior
The number of isbbox
the number of predictions, so screeningprior
is actually screening predictionsbbox
. Becausenanodet
the series isanchor-free
a model, each predicted coordinate is calculatedbbox
from the corresponding top-left coordinate.prior
prior_center = priors[:, :2] # image 坐标系下,cell 左上角坐标, shape = (num_priors, 2)
# 如果结果全为正,说明 cell 的左上角在 gt 里面
lt_ = prior_center[:, None] - gt_bboxes[:, :2] # 所有cell左上角 与 gt左上角 的差值, shape = (num_priors, num_gt, 2)
rb_ = gt_bboxes[:, 2:] - prior_center[:, None] # gt右下角 与 所有cell左上角 的差值, shape = (num_priors, num_gt, 2)
deltas = torch.cat([lt_, rb_], dim=-1) # (num_priors, num_gt, 4),坐标差值 (delta_x1, delta_y1, delta_x2. delta_y2)
# 判断 每个cell左上角 是否在 gt 里面。 先挑出4个坐标差值的最小值,再看其是否大于0,如果是置为 True.
is_in_gts = deltas.min(dim=-1).values > 0 # shape = (num_priors, num_gt)
# 如果 cell的左上角 至少在一个 gt 里面,则为 True ,否则为 False.
valid_mask = is_in_gts.sum(dim=1) > 0 # shape = (num_priors, ),得到 每个cell的左上角 与 所有gt 的关系数组
valid_mask
Preliminary screening of partial predictions and confidence levels with arraysbbox
to obtain the preliminary screening prediction results——valid_decoded_bbox
,valid_pred_scores
;
valid_decoded_bbox = decoded_bboxes[valid_mask] # 筛选,shape = (num_valid, 4)
valid_pred_scores = pred_scores[valid_mask] # 筛选, shape = (num_valid, num_classes)
num_valid = valid_decoded_bbox.size(0) # 得到符合条件的 bbox 个数
- Calculate the predicted results after preliminary screening
bbox
and thegt
matrixcost
(cost_matrix
)——iou_cost + cls_cost
. The idea herecls loss
is still used , that isnanodet
,gfl
the label of the classification loss is the predictediou
value. See my previous blog for details , so I won’t go into details here.
# shape = (num_valid, num_gt). IOU越大,匹配效果越好,我们需要 IOU 大的 bbox结果。
pairwise_ious = bbox_overlaps(valid_decoded_bbox, gt_bboxes) # 计算符合条件 bbox 与 gt 的 IOU值
# 转为 IOU 损失,IOU越大(靠近1),损失越小
iou_cost = -torch.log(pairwise_ious + 1e-7)
gt_onehot_label = (
F.one_hot(gt_labels.to(torch.int64), pred_scores.shape[-1]) # shape = (num_gts, num_classes)
.float()
.unsqueeze(0) # shape = (1, num_gts, num_classes)
.repeat(num_valid, 1, 1) # shape = (num_valid, num_gts, num_classes)
)
# shape 变为 (num_valid, num_gt, num_classes)
valid_pred_scores = valid_pred_scores.unsqueeze(1).repeat(1, num_gt, 1)
# 沿用了 gfl 的思路,用 IOU 值做分类的 label
soft_label = gt_onehot_label * pairwise_ious[..., None]
scale_factor = soft_label - valid_pred_scores
# 还是 gfl 的思路
cls_cost = F.binary_cross_entropy(
valid_pred_scores, soft_label, reduction="none"
) * scale_factor.abs().pow(2.0)
cls_cost = cls_cost.sum(dim=-1)
# shape = (num_valid, num_gt)。这个cost数组是分类损失与bbox损失的综合损失。 self.iou_factor = 3
cost_matrix = cls_cost + iou_cost * self.iou_factor # IOU更重视,毕竟当前是标签分配阶段,IOU越大,标签与bbox越匹配
- Sort
pairwise_ious
the matrix (preliminarily screened predictionsbbox
and eachgt
valueiou
) by columntopk
( ② ), and output the first values of eachgt
and all candidates in descending order. Then sum by column, and then directly round the result of the sum to get the number of pairs that can be paired (the limit is at least one), that is, the dynamic value ( ). After obtaining each one , it is necessary to perform a sorting process ( ③ ) on the predicted results and the matrix ( in the code below ) after the preliminary screening , and select each one to be matched .bbox
topk
iou
gt
bbox
1
k
dynamic_k
gt
dynamic_k
bbox
gt
cost
cost_matrix
cost
topk
gt
dynamic_k
bbox
# select candidate topk ious for dynamic-k calculation
candidate_topk = min(self.topk, pairwise_ious.size(0)) # 两个数之间选个最小值,免得报错
# 降序输出 每个gt与所有候选bbox的 前topk 个 IOU值。 shape = (candidate_topk, num_gt)
topk_ious, _ = torch.topk(pairwise_ious, candidate_topk, dim=0)
# calculate dynamic k for each gt. 先得到每个 gt 的前topk个IOU值之和,再取整,最后做截断。得到每个gt IOU之和的整数部分
# shape = (num_gt, ) 这个数组的每个元素是对应gt可以与几个bbox做匹配,最小值为1是因为gt肯定至少有一个bbox与之匹配
dynamic_ks = torch.clamp(topk_ious.sum(0).int(), min=1)
for gt_idx in range(num_gt):
_, pos_idx = torch.topk(
cost[:, gt_idx], k=dynamic_ks[gt_idx].item(), largest=False
) # 升序 动态K,选出损失最小的前 dynamic_k 个 bbox
matching_matrix[:, gt_idx][pos_idx] = 1.0 # gt 与哪个bbox匹配,元素值置为1
The figure below is pairwise_ious
a schematic diagram of column processing. The subsequent cost
processing of matrices is also similar.
- The following operation is deduplication, because each
gt
canbbox
match with multiple at the same time, and eachbbox
can only match with one at the same timegt
.
prior_match_gt_mask = matching_matrix.sum(1) > 1 # 大于 1 说明存在某些 bbox 会与多个 gt 匹配。
if prior_match_gt_mask.sum() > 0: # 判断是否有 bbox 匹配到 多个gt 的情况
# 下面几行的作用是 去除匹配多个 gt 的 bbox 情况,每个 bbox 只匹配一个 gt
cost_min, cost_argmin = torch.min(cost[prior_match_gt_mask, :], dim=1) # 选择损失最小的那个 gt 与 bbox 做匹配
matching_matrix[prior_match_gt_mask, :] *= 0.0
matching_matrix[prior_match_gt_mask, cost_argmin] = 1.0 # 除损失最小的gt外,其他都置为 0
The specific operation is to cost
output the smallest one gt
( ④ ) of the matrix row by row, which gt
is the current one bbox
to be matched gt
, thus avoiding the situation where one bbox
match is multiple gt
.
It is worth noting that although prior
the frequency of occurrence in the above steps is very low, it is more often bbox
the shadow. But I still have to say that the processing object of the positive and negative sample definition algorithm is prior
notbbox
. It is only used prior
if bbox
the number and position of the number and position are consistent, and it bbox
is more convenient in the above calculation bbox
. The results output above will eventually be used on prior
the Internet.
Finally, to sum up, a total of several screenings have been done (in fact, I have already marked it above), and what is the method:
① Determine prior
whether the upper left corner is gt
inside;
② Sort pairwise_ious
the matrix by column topk
to get the gt
dynamic k
value of each ( dynamic_k
) ;
③Sortcost
the matrix by column topk
to select each one gt
to be matched ; ④Output the smallest one by row in the matrix , and that is the one to be matched .dynamic_k
bbox
cost
gt
gt
bbox
gt
3. Questions and personal understanding
- Calculation method of dynamic
k
( ); how it came from is mentioned above, so I won’t go into details here. First of all , it is correct to evaluate the degree of matching with and , but the author sums up the previous values and rounds them up, which makes me very puzzled. Just imagine, if you first filter the value and a threshold, and then count the number of eligible ones, it sounds like it . But this will refer to a priori parameter- threshold, and prior knowledge is what the author wants to avoid, so the author did not choose this method. The author adds the values directly to obtain the overall situation of the values. The better the overall matching effect, the greater the sum of the values, and finally the greater, and vice versa.dynamic_k
dynamic_k
iou
bbox
gt
topk
iou
iou
iou
bbox
yolo v3
iou
iou
iou
iou
dynamic_k
valid_mask
The assignment update;
this will be more clear after reading the code comments, mainly becausevalid_mask
the arraydynamic_k_matching()
has been assigned and updated in the function, but the function does not output a new one when outputtingvalid_mask
. At that time, it was guessed that the memory address had not changed, so there was no need to output it directly. After checking the blog about the memory mechanism of python objects , it was confirmed that this is the reason.
To sum up, just reassigning some elements in the array will not create a new object. But if the array has calculations such as addition, subtraction, multiplication and division, a new object will be created.valid_mask[valid_mask.clone()] = fg_mask_inboxes #此处的赋值不会新建一块内存,所以此处的 valid_mask 与 114 行的一样
- Why
prior
should the upper left corner begt
inside; I mentioned it in
the previousnanodet
blognanodet-plus
, because it is to learn the distanceprior
from the upper left corner togt
the four boundaries, and I hope that the four distances learned are all positive numbers. Ifprior
the upper left corner isgt
outside, the model will not be able to learn one. Negative numbers pull it back. Second,prior
the upper left corner isgt
inside, meaning there are more features to learn.