【SOT】SiamRPN Code Notes

Code source: https://github.com/laisimiao/siamrpn.pytorch Some key points about SiamRPN code
combined with blog The following are notes when reading the above SiamRPN code

1. The template frame and detection frame get a feature through the same Siamese Network, and then pass through the classification branch and regression branch of RPN, where the template is used as the kernel to perform the correlation operation on the detection.

2. The function of the classification branch is to predict which anchors on the original image will have an IoU greater than a certain threshold with the target, and their corresponding point on the final feature map is 1; the regression branch is to predict the deviation between each anchor and the xywh of the target box shift

preprocess data (data preprocessing)

augmentation.py

  1. gray augmentation (optional): BGR->GRAY->BGR
  2. shift scale augmentation:
    1. if self.scale: (cx, cy, w, h) form, the coordinates of the center point remain unchanged, and the scale is scaled
    2. if self.shift: (x1, y1, x2, y2) form, move up, down, left, and right
    3. bbox does the same
  3. color augmentation
  4. blur augmentation
  5. flip augmentation

datasets.py(VIDYTBBLMDB)

  1. Randomly select a frame from the entire video as a template frame, and randomly select another frame within the range of frame_range before and after the template frame (not exceeding the length of the video) as a search frame
  2. Process the template frame and search frame –> template_image, template_box, search_image, search_box (picture, target box label)
  3. Data Augmentation –> template image, search image, bbox of search image
  4. ⭐️ Set the anchor, the output contains two formats ((x1, y1, x2, y2), (cx, cy, w, h))
  5. ⭐️ Generate cls, delta, delta_weight according to the set anchor and real bbox, that is, classification label, regression label, regression weight
  6. ⭐️ The final output is of the form { 'template': template, 'search': search, 'label_cls': cls, 'label_loc': delta, 'label_loc_weight': delta_weight }





⭐️ Indicates the difference from SiamFC

SiamRPN(model.py)

    z       x
    |       |
backbone backbone
    \       /
     RpnHead
    /       \
  cls       reg

AlexNet

backbone

RpnHead

cls:
       z_ft        x_ft
        |           |
      Conv2d      Conv2d
 [N,2K*256,4,4][N, 256, 20, 20]
        \           /
            Conv2d
              |
           pred_cls(cross_entropy)
reg:
       z_ft        x_ft
        |           |
      Conv2d      Conv2d
 [N,4K*256,4,4][N, 256, 20, 20]
        \           /
            Conv2d
              |
           pred_reg(smooth L1)

SiamRPNTracker (track.py, testing process)

  • set to eval mode

init(self, image, box)

  • This part is to use the prior information of the first frame, including the first frame picture and ground truth bbox, which is equivalent to a one-shot detection. The template frame is fixed, which is equivalent to a kernel
  1. Only for z: crop a patch that is centered on the center of the box and slightly larger than the box size (s_z), then resize to a size of 127, and pad with the average color if necessary
  2. For z only: sent to backbone

update(self, image)

  • Input a subsequent frame, then add scale and ratio penalties according to the predicted value, then use the cosine window to suppress large displacement, and then return to predict the target position according to the anchor corresponding to the highest value of the classification score
  1. Only for x: crop a patch with the box center as the center and s_x as the size (s_x=s_z*255/127), then resize it to a size of 255, and pad with the average color if necessary
  2. For x only: sent to backbone
  3. Send the feature extraction result of z obtained in init and the feature extraction result of x in the previous step to rpnhead to get outputs
  4. ⭐️ Convert the classification prediction and regression prediction of outputs into score (via softmax) and pred_bbox (corrected anchor)
  5. ⭐️ scale penalty、aspect ratio penalty、window penalty
  6. find peak point
  7. ⭐️ Adjust the size of bbox (adjust with learning rate) and position, and output bbox
  8. Update the target size (size) and position (center_pos) to prepare for the next frame

train.py (training process)

  1. Build the SiamRPN model
  2. Build the dataset (VIDYTBBLMDB class) and dataloader
  3. Import the pre-trained model (siamrpn.backbone)
  4. Fix the parameters of the pretrained model
  5. Set the optimizer and learning rate adjustment strategy
  6. Unfix the parameters of the pre-trained model starting from the 10th epoch
  7. Send the template and search area to the network: siamrpn(data['template'], data['search']), get the output pred_cls, pred_reg
  8. Output pred_cls, pred_reg and data['label_cls'], data['label_loc'], data['label_loc_weight'] to calculate the loss
  9. Record information such as loss and time
  10. Gradient backpropagation and optimizer update, learning rate update before the end of an epoch
  11. save model

Guess you like

Origin blog.csdn.net/zylooooooooong/article/details/123192884