Faster RCNN原理及Pytorch代码解读——RPN(六):进一步筛选得到最终候选框

上一篇博客已经知道了如果是训练过程,那么生成的候选框是2000个,但是实际上这还是会有很多背景框,真正包含物体的框仍然是十分少的。所以为了加速网络收敛和平衡正负样本之间的差距,对已生成的候选框可以进行再一步筛选(需要注意的是这个过程仅针对训练,如果是预测阶段,RPN生成的候选框会直接送到Faster RCNN的下一个部分),过程与RPN中筛选Anchor的过程类似,利用标签与Proposal构建IoU矩阵, 通过与标签的重合程度选出256个正负样本。

这一步有3个作用:

1. 筛选出了更贴近真实物体的RoI,使送入到后续网络的物体正、负样本更均衡,避免了负样本过多,正样本过少的情况。
2. 减少了送入后续全连接网络的数量,有效减少了计算量。
3. 筛选Proposal得到RoI的过程中,由于使用了标签来筛选,因此也为每一个RoI赋予了正、 负样本的标签,同时可以在此求得RoI变换到对应标签的偏移量, 这样就求得了RCNN部分的真值。

具体实现时,首先计算Proposal与所有的物体标签的IoU矩阵,然后根据IoU矩阵的值来筛选出符合条件的正负样本。筛选标准如下:

1. 对于任何一个Proposal,其与所有标签的最大IoU如果大于等于0.5 则视为正样本。
2. 对于任何一个Proposal,其与所有标签的最大IoU如果大于等于0且小于0.5,则视为负样本。

经过这种方式筛选出来的Proposal无法保证正负样本的数量,因此设定正、负样本的总数为256个, 其中正样本的数量为p个。为了控制正、负样本的比例基本满足1:3, 在此正样本数量p不超过64, 如果超过了64则从正样本中随机选取64个。剩余的数量256-p为负样本的数量,如果超过了256-p则从负样本中随机选取256-p个。

关键代码在lib/model/rpn/proposal_target_layer_cascade.py下。
实现这个筛选得到最终候选框的类如下

class _ProposalTargetLayer(nn.Module):
    """
    Assign object detection proposals to ground-truth targets. Produces proposal
    classification labels and bounding-box regression targets.
    """

    def __init__(self, nclasses):
        super(_ProposalTargetLayer, self).__init__()
        self._num_classes = nclasses	# 类别数
        # 预测框标准化时的均值、标准差和权重
        self.BBOX_NORMALIZE_MEANS = torch.FloatTensor(cfg.TRAIN.BBOX_NORMALIZE_MEANS)
        self.BBOX_NORMALIZE_STDS = torch.FloatTensor(cfg.TRAIN.BBOX_NORMALIZE_STDS)
        self.BBOX_INSIDE_WEIGHTS = torch.FloatTensor(cfg.TRAIN.BBOX_INSIDE_WEIGHTS)

    def forward(self, all_rois, gt_boxes, num_boxes):
    	"""
    	all_rois: RPN生成的候选区域,shape(batch, 2000, 5)
        gt_boxes: 真实边框,shape(batch, K, 5)
        num_boxes: 真实边框数量

		return:
		rois:筛选出来的正负ROI,shape(batch, 256, 5).第一维是batch编号,后四维是坐标
		labels:筛选出来的正负ROI对应的类别标签,shape(batch, 256)
		bbox_targets: 筛选出来的正负ROI对应的偏移量,shape(batch, 256, 4)
		bbox_inside_weights:存在有真实物体对应ROI的回归权重, shape(batch, 256, 4)
		bbox_outside_weights:存在有真实物体对应ROI的权重,shape(batch, 256, 4)
		"""

        self.BBOX_NORMALIZE_MEANS = self.BBOX_NORMALIZE_MEANS.type_as(gt_boxes)
        self.BBOX_NORMALIZE_STDS = self.BBOX_NORMALIZE_STDS.type_as(gt_boxes)
        self.BBOX_INSIDE_WEIGHTS = self.BBOX_INSIDE_WEIGHTS.type_as(gt_boxes)

		
        gt_boxes_append = gt_boxes.new(gt_boxes.size()).zero_()
        gt_boxes_append[:,:,1:5] = gt_boxes[:,:,:4]	 # 取前四维,第五维是物体类别

        # 将真实边框和候选框在通道上拼接起来,shape(batch, 2000+K, 5)
        all_rois = torch.cat([all_rois, gt_boxes_append], 1)

        num_images = 1
        # BATCH_SIZE为批处理大小(或者说ROI数量),数值为256
        rois_per_image = int(cfg.TRAIN.BATCH_SIZE / num_images)	
        # FG_FRACTION为每一个样本中标记为前景的边框比例,数值为0.25
        fg_rois_per_image = int(np.round(cfg.TRAIN.FG_FRACTION * rois_per_image))
        fg_rois_per_image = 1 if fg_rois_per_image == 0 else fg_rois_per_image	# fg_rois_per_image = 64

        labels, rois, bbox_targets, bbox_inside_weights = self._sample_rois_pytorch(
            all_rois, gt_boxes, fg_rois_per_image,
            rois_per_image, self._num_classes)

        bbox_outside_weights = (bbox_inside_weights > 0).float()

        return rois, labels, bbox_targets, bbox_inside_weights, bbox_outside_weights

下面是整个部分的关键代码。

    def _sample_rois_pytorch(self, all_rois, gt_boxes, fg_rois_per_image, rois_per_image, num_classes):
        """
        生成包含前景和背景的roi的随机样本示例。
        all_rois: 真实边框和候选框在通道上拼接起来,shape(batch, 2000+K, 5)
        gt_boxes: 真实边框,shape(batch, K, 5)
        fg_rois_per_image: 每一个样本中标记为前景的边框数量,数值为64
        rois_per_image:每一个样本标记为前景的边框数量,数值为256
        num_classes: 类别数
        """
        # overlaps: (rois x gt_boxes)

        # 利用Proposal与标签生成IoU矩阵
        # bbox_overlaps_batch在RPN训练标签的生成一文中注释过,只不过当时是二维的情况,三维的IoU生成与二维没有什么区别
        # overlaps的shape(batch, 2000+K, K)
        overlaps = bbox_overlaps_batch(all_rois, gt_boxes)

		# 求每一行的最大值与最大值索引,shape都为(batch, 2000+K)
        max_overlaps, gt_assignment = torch.max(overlaps, 2)

        batch_size = overlaps.size(0)	# 批处理大小
        num_proposal = overlaps.size(1)		# 候选框数目
        num_boxes_per_img = overlaps.size(2)	# 真实边框数目

		# offset是对IoU矩阵的每一行的最大值索引加上其所在的batch的编号*每个batch的真实边框数(即K)
        offset = torch.arange(0, batch_size)*gt_boxes.size(1)
        offset = offset.view(-1, 1).type_as(gt_assignment) + gt_assignment

        # 取出IoU矩阵中每一行对应着有最大IoU的真实边框的类别标签,shape(batch, 2000+K)
        labels = gt_boxes[:,:,4].contiguous().view(-1)[(offset.view(-1),)].view(batch_size, -1)        

		# 用于记录正负样本的标签,其值来自于真实边框的类别标签, shape(batch, 256)
        labels_batch = labels.new(batch_size, rois_per_image).zero_()
        # 用于记录正负样本对应的预测ROI,其值来自于RPN回归分支输出, # shape(batch, 256, 5)
        rois_batch  = all_rois.new(batch_size, rois_per_image, 5).zero_()
        # 用于记录正负样本对应的真实边框,其值来自于真实边框, # shape(batch, 256, 5)
        gt_rois_batch = all_rois.new(batch_size, rois_per_image, 5).zero_()

        # 选择满足条件的正负样本
        for i in range(batch_size):
			# 得到大于前景阈值的索引
            fg_inds = torch.nonzero(max_overlaps[i] >= cfg.TRAIN.FG_THRESH).view(-1)
            # numel()函数:返回数组中元素的个数
            fg_num_rois = fg_inds.numel()

            # 获取IoU在[0.1,0.5]的索引作为背景索引
            bg_inds = torch.nonzero((max_overlaps[i] < cfg.TRAIN.BG_THRESH_HI) &
                                    (max_overlaps[i] >= cfg.TRAIN.BG_THRESH_LO)).view(-1)
            bg_num_rois = bg_inds.numel()

            # 如果正样本超过64个,负样本超过(256-正样本)的数量,则进行下采样随机选取
            if fg_num_rois > 0 and bg_num_rois > 0:
                # 对正样本进行采样
                fg_rois_per_this_image = min(fg_rois_per_image, fg_num_rois)

                # torch.randperm seems has a bug on multi-gpu setting that cause the segfault.
                # See https://github.com/pytorch/pytorch/issues/1868 for more details.
                # use numpy instead.
                #rand_num = torch.randperm(fg_num_rois).long().cuda()
                # 生成一个长度为fg_num_rois序列[0,1,2......,fg_num_rois-1]进行随机排序
                rand_num = torch.from_numpy(np.random.permutation(fg_num_rois)).type_as(gt_boxes).long()
                fg_inds = fg_inds[rand_num[:fg_rois_per_this_image]]

                # 对负样本进行采样
                bg_rois_per_this_image = rois_per_image - fg_rois_per_this_image

                # Seems torch.rand has a bug, it will generate very large number and make an error.
                # We use numpy rand instead.
                #rand_num = (torch.rand(bg_rois_per_this_image) * bg_num_rois).long().cuda()
                rand_num = np.floor(np.random.rand(bg_rois_per_this_image) * bg_num_rois)
                rand_num = torch.from_numpy(rand_num).type_as(gt_boxes).long()
                bg_inds = bg_inds[rand_num]

            elif fg_num_rois > 0 and bg_num_rois == 0:
                # 采样256个正样本
                #rand_num = torch.floor(torch.rand(rois_per_image) * fg_num_rois).long().cuda()
                rand_num = np.floor(np.random.rand(rois_per_image) * fg_num_rois)
                rand_num = torch.from_numpy(rand_num).type_as(gt_boxes).long()
                fg_inds = fg_inds[rand_num]
                fg_rois_per_this_image = rois_per_image
                bg_rois_per_this_image = 0
            elif bg_num_rois > 0 and fg_num_rois == 0:
                # 采样256个负样本
                #rand_num = torch.floor(torch.rand(rois_per_image) * bg_num_rois).long().cuda()
                rand_num = np.floor(np.random.rand(rois_per_image) * bg_num_rois)
                rand_num = torch.from_numpy(rand_num).type_as(gt_boxes).long()

                bg_inds = bg_inds[rand_num]
                bg_rois_per_this_image = rois_per_image
                fg_rois_per_this_image = 0
            else:
                raise ValueError("bg_num_rois = 0 and fg_num_rois = 0, this should not happen!")

            # 将正样本和负样本索引拼接在一起
            keep_inds = torch.cat([fg_inds, bg_inds], 0)

            # 取出正样本和负样本的标签
            labels_batch[i].copy_(labels[i][keep_inds])

            # 保证将背景ROI的标签置为0
            if fg_rois_per_this_image < rois_per_image:
                labels_batch[i][fg_rois_per_this_image:] = 0

			# 记录取出对应的预测ROI
            rois_batch[i] = all_rois[i][keep_inds]
            rois_batch[i,:,0] = i
            # 记录取出对应的真实边框
            gt_rois_batch[i] = gt_boxes[i][gt_assignment[i][keep_inds]]

        # 计算每一个Proposal相对于其标签的偏移量,shape(batch, 256, 4)
        bbox_target_data = self._compute_targets_pytorch(
                rois_batch[:,:,1:5], gt_rois_batch[:,:,:4])	# bbox_target_datashape(batch, 256, 4)
		
		# 将背景类对应的ROI偏移量置0和记录对应有真实物体对应ROI的权重
        bbox_targets, bbox_inside_weights = \
                self._get_bbox_regression_labels_pytorch(bbox_target_data, labels_batch, num_classes)

        return labels_batch, rois_batch, bbox_targets, bbox_inside_weights
    def _compute_targets_pytorch(self, ex_rois, gt_rois):
        """
        计算每一张图片中ROI的偏移量.
		ex_rois: 预测的ROI,shape(batch, 256, 4)
		gt_rois: 对应ROI的真实边框, shape(batch, 256, 4)
		"""

        assert ex_rois.size(1) == gt_rois.size(1)
        assert ex_rois.size(2) == 4
        assert gt_rois.size(2) == 4

        batch_size = ex_rois.size(0)	# 批处理大小
        rois_per_image = ex_rois.size(1)	# 每张图片的ROI数目

		# 计算ROI与真实边框的偏移量,shape(batch, 256, 4)
        targets = bbox_transform_batch(ex_rois, gt_rois)

		# 是否对偏移量进行标准化操作(减均值除标准差操作)
        if cfg.TRAIN.BBOX_NORMALIZE_TARGETS_PRECOMPUTED:
            # Optionally normalize targets by a precomputed mean and stdev
            targets = ((targets - self.BBOX_NORMALIZE_MEANS.expand_as(targets))
                        / self.BBOX_NORMALIZE_STDS.expand_as(targets))

        return targets
    def _get_bbox_regression_labels_pytorch(self, bbox_target_data, labels_batch, num_classes):
        """Bounding-box regression targets (bbox_target_data) are stored in a
        compact form b x N x (class, tx, ty, tw, th)

        This function expands those targets into the 4-of-4*K representation used
        by the network (i.e. only one class has non-zero targets).

        Returns:
            bbox_target (ndarray): b x N x 4K blob of regression targets
            bbox_inside_weights (ndarray): b x N x 4K blob of loss weights
        """
        batch_size = labels_batch.size(0)	# 批处理大小
        rois_per_image = labels_batch.size(1)	# 每张图片的ROI数量
        clss = labels_batch
        bbox_targets = bbox_target_data.new(batch_size, rois_per_image, 4).zero_()	# shape(batch, 256, 4)
        bbox_inside_weights = bbox_target_data.new(bbox_targets.size()).zero_() # shape(batch, 256, 4)

        for b in range(batch_size):
            # 如果一张图片全是背景,则跳过
            if clss[b].sum() == 0:
                continue
            # 返回一张图片存在物体的索引
            inds = torch.nonzero(clss[b] > 0).view(-1)
            for i in range(inds.numel()):
                ind = inds[i]
                bbox_targets[b, ind, :] = bbox_target_data[b, ind, :]
                # BBOX_INSIDE_WEIGHTS为偏移量权重
                bbox_inside_weights[b, ind, :] = self.BBOX_INSIDE_WEIGHTS

        return bbox_targets, bbox_inside_weights

おすすめ

転載: blog.csdn.net/weixin_41693877/article/details/107226770