Tracking without bells and whistles paper reading

Tracking without bells and whistles paper reading

Use the regression ability of the detector to predict the bbox of the next frame of picture.

The main work of this article:

  1. By using the regression head of the detector to return the bbox of the previous frame to the current frame, the MOT problem is solved;
  2. Common MOT tracking failure scenarios are analyzed, and other methods are not as good as the method proposed in this article;

Converting detectors directly into trackers has two benefits:

  1. No additional training in tracking is required;
  2. There is no need for any additional optimization in the prediction phase.

method

Vanilla Tracktor

image-20230615193200015

The main method is divided into two steps:

  1. Bounding box regression . The first part is the blue arrow in the above figure. Using the detector, it will come from the t − 1 t-1t1 frame bboxbt − 1 k \mathbf{b}^{k}_{t-1}bt1kReturn to ttThe position of t framebtk \mathbf{b}^{k}_{t}btk. Specifically, if you are using Faster R-CNN, use RoI pooling to apply it to the current frame ( ttt frame), but RoI pooling uses the position of the bbox of the previous frame. Therefore, there is an assumption involved here,that is, the target will not move too far relative to the previous frame. The ID will also change frombt − 1 k \mathbf{b}^{k}_{t-1}bt1kMove to btk \mathbf{b}^{k}_{t}btk. The following two situations will kill this trajectory: the confidence obtained by regression stk < σ activity s_{t}^{k}<\sigma_{activity}stk<pactivity, that is, the target is occluded or leaves the screen; IoU is greater than the threshold;
  2. Bounding box initialization . In order to initialize a new trajectory, target detection will be performed on the entire screen, which is the red arrow in the figure above. Only when the IoU between the detected bbox and the bbox obtained by regression in the previous step is less than the threshold, a new bbox will be initialized.

extension

This part is an extension of the previous part. Because when the camera is moving or the video frame rate is very low, the above method may not work very well.

Motion model : Add camera motion compensation (CMC) to scenes where the camera moves. Use ECC (Enhanced Correlation Coeffificient) maximization for image registration. For scenes with very low frame rates, add constant velocity assumption (CVA) to each target.

ReID : Stores the ReID features of killed trajectories. For newly initialized trajectories, match them in the killed ReID features.

Guess you like

Origin blog.csdn.net/fuss1207/article/details/131235156