SiamRPN code analysis: test


Preface

  This article is the last test of SiamRPN code analysis. The code is analyzed from init and unpate (tracking).


1、init

  The main task of init is to obtain the cross-correlation between the feature map of the upper branch and the feature map of the subsequent frame. After obtaining the target template image of 127x127x3 size, add a dimension to it and then enter the network, the returned classification convolution kernel and regression convolution The shapes of the cores are [10,256,4,4] and [20,256,4,4]. The names here do not directly return these two cores, but store them in self (SiamRPNTracker class), you can look at the first The analysis of the architecture; secondly, init also needs to obtain the current target's position, height and width, etc., and subsequent updates need to update it, which is to predict the new target position and target size.

#初始帧
    def init(self, frame, bbox):
        #[l,t,w,h]->[center_x,center_y,w,h]
        self.bbox = np.array([bbox[0]-1 + (bbox[2]-1) / 2 , bbox[1]-1 + (bbox[3]-1) / 2 , bbox[2], bbox[3]])
        #获取目标中心点[c_x,c_y]以便后续使用
        self.pos = np.array([bbox[0]-1 + (bbox[2]-1) / 2 , bbox[1]-1 + (bbox[3]-1) / 2])
        #获取目标宽高[w,h]以便后续使用
        self.target_sz = np.array([bbox[2], bbox[3]])
        #获取目标宽高[w,h]以便后续使用
        self.origin_target_sz = np.array([bbox[2], bbox[3]])
        #获取需要图像R/G/B均值   img_mean.shape=(1,1,3)
        self.img_mean = np.mean(frame, axis=(0, 1))
        #获取模板图像
        #返回127x127x3大小的图像
        exemplar_img = get_exemplar_image(frame, self.bbox,config.exemplar_size, config.context_amount, self.img_mean)
        #增加一个batch维度再输入网络
        exemplar_img = self.transforms(exemplar_img)[None, :, :, :]
        self.model.track_init(exemplar_img.cuda())

2、update(tracking)

  The update process or follow the code process to analyze

    def update(self, frame):
        #传入上一帧的bbox,以及保持初始帧的img_mean不变来填充后续所有帧
        #返回271x271x3大小的图片、缩放因子是271/(上下文信息x271/127)
        instance_img_np, _, _, scale_x = get_instance_image(frame, self.bbox, config.exemplar_size,config.instance_size, config.context_amount, self.img_mean)
        """————————————————得到分类分数和回归参数————————————"""
        #增加一个batch维度再送入网络
        instance_img = self.transforms(instance_img_np)[None, :, :, :]
        #返回score.shape=[1,10,19,19],regression.shape=[1,20,19,19]
        pred_score, pred_regression = self.model.track_update(instance_img.cuda())
        #[1,10,19,19]->[1,2,5*19*19]->[1,1805,2] conf即置信度
        pred_conf = pred_score.reshape(-1, 2, config.anchor_num * config.score_size * config.score_size).permute(0,2,1)
        #[1,20,19,19]->[1,4,5*19*19]->[1,1805,4] offset即位移,对anchor微调
        pred_offset = pred_regression.reshape(-1, 4,config.anchor_num * config.score_size * config.score_size).permute(0,2,1)

First obtain the detection image of 271x271x3 size, input it into the network after expanding the dimension, and obtain the cross-correlation input, then call the track_update method to obtain the classification score and regression parameters, and then the two shape transformations for subsequent use.

	    #传入的anchor(1805,4) delta(1805,4),delta是回归参数,对anchor进行调整,返回调整后的anchor,即pre_box(1805,4)
        box_pred = box_transform_inv(self.anchors, delta)
        #pred_conf=[1,1805,2]
        #score_pred.shape=torch.Size([1805]) 取1,表示取正样本
        score_pred = F.softmax(pred_conf, dim=2)[0, :, 1].cpu().detach().numpy()#计算预测分类得分

Apply the 1805 regression parameters to 1805 anchors to make their position information more accurate, and perform softmax processing on the foreground value of the classification score to obtain 1805 numbers with a sum of 1, and then find the largest number, which corresponds to the anchor It is the prediction box.

		#尺度惩罚 一个>1的数
        s_c = change(sz(box_pred[:, 2], box_pred[:, 3]) / (sz_wh(self.target_sz * scale_x)))
        #比例惩罚 一个>1的数
        r_c = change((self.target_sz[0] / self.target_sz[1]) / (box_pred[:, 2] / box_pred[:, 3]))
        # 尺度惩罚和比例惩罚 penalty_k=0.22,penalty最大为1,即不惩罚
        penalty = np.exp(-(r_c * s_c - 1.) * config.penalty_k)
        pscore = penalty * score_pred#对每一个anchors的正样本分类预测分数×惩罚因子
        pscore = pscore * (1 - config.window_influence) + self.window * config.window_influence #再乘以余弦窗
        max_pscore_id= np.argmax(pscore) #返回最大得分的索引id

The premise of the penalty is to assume that the size (scale and height-to-width ratio) of the target changes in adjacent frames, so two penalty items of scale and ratio are added, and it is assumed that the position of the target in adjacent frames will not change too much, so use Cosine window to suppress large displacements, as the paper says, use penalty penaltyp e n a l t y to suppress large changes in scale and scale, and the cosine window to suppress large displacements. Penalty penaltyin the codeIn the formula of p e n a l t y , if the size and ratio of the anchor are close to the previous frame, the two values ​​of s_c and r_c are both 1,penalty penaltyp e n a l t y =e^0=1, that is, no penalty. Passpenalty penaltyAfter p e n a l t y and the cosine window, the largest foreground score index is obtained, which is recorded as max_pscore_id.
Insert picture description here
  After max_pscore_id is obtained, it means that the anchor with the largest foreground score is found as the prediction frame. Above, it has been fine-tuned, so the anchor obtained now is the final prediction frame. The next step is to take the center position and size of the new prediction box to update the target state, without analyzing it.
  I forgot to mention that the nms operation was not used in the testing process, and the paper mentioned that nms should be used to obtain the final bounding box, but my personal understanding does not seem to make much sense. The tracking process only needs to find the largest prospect score. The value is sufficient, and nms is not required. nms is used to eliminate overlapping anchors, and obtaining the maximum foreground score does not involve the overlapping relationship of anchors. nms is a very important step in the detection task.
Insert picture description here
Thanks for watching!

Guess you like

Origin blog.csdn.net/qq_41831753/article/details/113930169