Detailed explanation of anchor_target_3d

Recently, I want to study 3d iou loss. The principle is very simple, but the implementation process is somewhat unclear, so I will sort it out.

In mmdetection3d, iou should be calculated by predicting 3dbbox and ground truth 3d bbox.
In the hands of program implementation, due to the need for forward and backward propagation, it should be implemented in pts_bbox_head.
What needs to be known in advance is anchors = torch.Size([1, 200, 176, 3, 2, 7])
200x176x3 is the size of the anchor, 2 is the direction, and 7 is the bbox parameter

        # regression loss
        #torch.Size([1, 42, 200, 176])->[211200, 7]
        bbox_pred = bbox_pred.permute(0, 2, 3, 1).reshape(-1, self.box_code_size)
        bbox_targets = bbox_targets.reshape(-1, self.box_code_size)
        bbox_weights = bbox_weights.reshape(-1, self.box_code_size)
        pos_inds = ((labels >= 0)
                    & (labels < bg_class_ind)).nonzero(
                        as_tuple=False).reshape(-1)
		#这里将211200(200x176X6?)个voxel的索引找到,label = 0,1,2

Next, extract the predicted and true value data by index,

        pos_bbox_pred = bbox_pred[pos_inds]
        pos_bbox_targets = bbox_targets[pos_inds]
        pos_bbox_weights = bbox_weights[pos_inds]

Here I mainly want to figure out why bbox can be calculated between the two, so as to introduce the calculation method of iou

In the code that speculates on the final result:

            # torch.Size([42, 200, 176]) ->[211200, 7]
            bbox_pred = bbox_pred.permute(1, 2, 0).reshape(-1, self.box_code_size)

            nms_pre = cfg.get('nms_pre', -1)
            if nms_pre > 0 and scores.shape[0] > nms_pre:
                if self.use_sigmoid_cls:
                    max_scores, _ = scores.max(dim=1)
                else:
                    max_scores, _ = scores[:, :-1].max(dim=1)
                _, topk_inds = max_scores.topk(nms_pre)
                #取100个最大分数的预测值
                anchors = anchors[topk_inds, :]
                bbox_pred = bbox_pred[topk_inds, :]
                scores = scores[topk_inds, :]
                dir_cls_score = dir_cls_score[topk_inds]

Get the prediction matrix and parse it into the value of the prediction box

            bboxes = self.bbox_coder.decode(anchors, bbox_pred)

then

        mlvl_bboxes_for_nms = xywhr2xyxyr(input_meta['box_type_3d'](mlvl_bboxes, box_dim=self.box_code_size).bev)

Convert xyzwhlr to xyxyr before calculating nms

18, 200, 176]
[42, 200, 176]
[12, 200, 176]
torch.Size([211200, 7])
input_meta
rescale = true

Here bbox_target-torch.Size([70400, 7]) is a matrix with an initial value of 0. The former represents the distribution of the anchor 200x176x2, and the latter represents the offset value of 7 dimensions. Pos_gt_bboxes is calculated by self.bbox_coder.encode - pos_bboxes
finally gets a target of 211200 dimensions because there are 3 self.bbox_assigners of different sizes (different iou for the three classes):
min_pos_iou:0.2
neg_iou_thr:0.2
pos_iou_thr:0.35

We found in comparing the predicted frame and the real frame:
the real frame:
tensor([[ 2.3841, 4.0668, -1.6432, 1.6776, 4.0138, 1.7290, 1.3369], [ 2.3841, 4.0668, -1.6432, 1.6776, 4.0138, 1.7 290
, 1.3369],
[ 2.3841, 4.0668, -1.6432, 1.6776, 4.0138, 1.7290, 1.3369], [
2.3841, 4.0668, -1.6432, 1.6776, 4.0138, 1.7290, 1.3369], [ 2.38 41, 4.0668, -1.6432, 1.6776, 4.0138, 1.7290
, 1.3369] ,
[ 2.3841, 4.0668, -1.6432, 1.6776, 4.0138, 1.7290, 1.3369],
[15.1863, 7.3920, -1.5572, 1.7496, 4.5181, 1.5129, 1.2769], [15. 1863, 7.3920, -1.5572, 1.7496, 4.5181, 1.5129
, 1.2769 ],
[15.1863, 7.3920, -1.5572, 1.7496, 4.5181, 1.5129, 1.2769],
[15.1863, 7.3920, -1.5572, 1.7496, 4.5181, 1.5129, 1.2769],
[44 .4734, 24.8744, -1.9314, 1.8011, 3.4889, 1.6981, 0.9469],
[44.4734, 24.8744, -1.9314, 1.8011, 3.4889, 1.6981, 0.9469], [
44.4734, 24.8744, -1.9314, 1.8011, 3.4889, 1.6981, 0.9469], [44. 4734, 24.8744, -1.9314, 1.8011, 3.4889,
1.6981, 0.9469] ,
[44.4734, 24.8744, -1.9314, 1.8011, 3.4889, 1.6981, 0.9469],
[44.4734, 24.8744, -1.9314, 1.8011, 3.4889, 1.6981, 0.9469]], de vice
='cuda:0')
prediction box:
tensor([ [ 2.0114, 3.8191, -1.7800, 1.6000, 3.9000, 1.5600, 1.5700], [
2.4137, 3.8191, -1.7800, 1.6000, 3.9000, 1.5600, 1.5700], [ 2.81 60, 3.8191, -1.7800, 1.6000, 3.9000, 1.5600
, 1.5700] ,
[ 2.0114, 4.2211, -1.7800, 1.6000, 3.9000, 1.5600, 1.5700], [
2.4137, 4.2211, -1.7800, 1.6000, 3.9000, 1.5600, 1.5700],
[ 2.8160, 4.2211, -1.7800, 1.6000, 3.9000, 1.5600, 1.5700],
[14.4823, 7.4372, -1.7800, 1.6000, 3.9000, 1.5600, 1.5700],
[14.8846, 7.4372, -1.7800, 1.6000, 3.9000, 1.5600, 1.5700],
[15.2869, 7.4372, -1.7800, 1.6000, 3.9000, 1.5600, 1.5700],
[15.6891, 7.4372, -1.7800, 1.6000, 3.9000, 1.5600, 1.5700],
[43.8491, 24.7236, -1.7800, 1.6000, 3.9000, 1.5600, 1.5700],
[44.2514, 24.7236, -1.7800, 1.6000, 3.9000, 1.5600, 1.5700],
[44.6537, 24.7236, -1.7800, 1.6000, 3.9000, 1.5600, 1.5700],
[45.0560, 24.7236, -1.7800, 1.6000, 3.9000, 1.5600, 1.5700],
[44.2514, 25.1256, -1.7800, 1.6000, 3.9000, 1.5600, 1.5700],
[44.6537, 25.1256, -1.7800, 1.6000, 3.9000, 1.5600, 1.5700]],
device=‘cuda:0’)
Each of the real boxes is the size of the real box
and the prediction box represents the size of each individual anchor.
When calculating iou:
iou_bbox_pred_xyz
tensor([[ 16.5084, -12.0242, -1.4971, 1.6237, 4.0466, 1.4356, 0.2724],
[ 16.5435, -11.8211, -1.4988, 1.6494, 4.3110, 1.4305, 0.2859],
[ 16.5761, -11.8400, -1.5100, 1.6789, 4.3495, 1.4355, 0.2741],
[ 16.5754, -11.8323, -1.4928, 1.6733, 4.3177, 1.4270 , 0.2728],
[ 16.5152, -11.7441, -1.4888, 1.6896, 4.3513, 1.4431, 0.2851], [ 14.6430, -7.8684, -1.5783, 0.6639, 0.9870,
1.8041, - 1.4695], [
14.6432, -7.8733, -1.5769, 0.6611, 0.9864, 1.8070, 1.6531],
[ 14.6598, -7.8735, -1.5899, 0.6282, 1.0187, 1.8009, 1.6100], [ 15.4246, -6.0949, -1.8054, 0.6 283,
1.7626, 1.8466, 1.8692],
[ 15.3690, -6.0648, -1.7394, 0.5986, 1.7214, 1.8281, 1.8346],
[ 15.4138, -6.1166, -1.7065, 0.5922, 1.7588, 1.7961, 1.8254],
[ 15.8956, -2.2017, -1.6940, 0.6018, 1.8273, 1.7625, 1.9136],
[ 15.9363, -2.1736, -1.7032, 0.5824, 1.7944, 1.7932, 1.8970],
[ 15.9095, -2.1962, -1.7189, 0.5758, 1.7861, 1.7966, 1.8979],
[ 15.8770, -2.1996, -1.7384, 0.6050, 1.7874, 1.7796, 1.8670]],
device=‘cuda:0’, grad_fn=)
iou_bbox_targets_xyz:
tensor([[ 16.5635, -11.8167, -1.4796, 1.6468, 4.4654, 1.4245, 0.3080],
[ 16.5635, -11.8167, -1.4796, 1.6468, 4.4654, 1.4245, 0.3080],
[ 16.5635, -11.8167, -1.4796, 1.6468, 4.4654, 1.4245, 0.3080],
[ 16.5635, -11.8167, -1.4796, 1.6468, 4.4654, 1.4245, 0.3080],
[ 16.5635, -11.8167, -1.4796, 1.6468, 4.4654, 1.4245, 0.3080],
[ 14.5729, -7.8654, -1.5648, 0.5354, 0.8890, 1.8387, -1.2720],
[ 14.5729, -7.8654, -1.5648, 0.5354, 0.8890, 1.8387, -1.2720],
[ 14.5729, -7.8654, -1.5648, 0.5354, 0.8890, 1.8387, -1.2720],
[ 15.2363, -6.1369, -1.7029, 0.5860, 1.7175, 1.8286, -1.2620],
[ 15.2363, -6.1369, -1.7029, 0.5860, 1.7175, 1.8286, -1.2620],
[ 15.2363, -6.1369, -1.7029, 0.5860, 1.7175, 1.8286, -1.2620],
[ 15.9368, -2.1525, -1.6482, 0.6163, 1.7175, 1.7478, 1.8680],
[ 15.9368, -2.1525, -1.6482, 0.6163, 1.7175, 1.7478, 1.8680],
[ 15.9368, -2.1525, -1.6482, 0.6163, 1.7175, 1.7478, 1.8680],
[ 15.9368, -2.1525, -1.6482, 0.6163, 1.7175, 1.7478, 1.8680]],
device=‘cuda:0’)

Guess you like

Origin blog.csdn.net/ll594282475/article/details/121357547