Reading experience: SORT: Simple online and realtime tracking

Paper address: Simple online and realtime tracking

1. Introduction

Based on the basic Kalman filter and Hungarian algorithm for tracking, just replacing a detector can improve the tracking effect by 18.9% and achieve SOTA.

This method is an online method. It is believed that merging complexity in the form of object re-identification will bring huge overhead to the tracking framework and potentially limit its use in real-time applications. Therefore, the appearance features outside the detection component are ignored in the tracking. , Only use the position and size of the bounding box for motion estimation and data association, and in order not to introduce unnecessary complexity, the long- and short-range occlusion problem is ignored.

This method is mainly for efficiently and reliably processing the frame-to-frame association. We do not aim at robustness to detection errors, but use the latest developments in visual object detection to directly solve the detection problem. This minimal tracking form improves the efficiency and reliability of online tracking, as shown in Figure 1.
Insert picture description here
Main contributions:

  1. CNN-based brute force detection is used in the MOT context.
  2. A pragmatic detector using Kalman filter and Hungarian algorithm, evaluated on the latest MOT standard.
  3. The code will be open source to help establish a baseline method for research experiments and use in applications that avoid vision.

Two, method

1. Detection

Taking advantage of the rapid development of CNN-based detectors, Faster Region CNN (FrRCNN) is used as the detector framework. FrRCNN is an end-to-end two-step framework. The first step is to extract the possible regions corresponding to the features for the second step, and the second step is to classify the targets in the possible regions. The advantage is that the two-step parameters are shared to achieve an effective detection. In addition, the network structure itself can be exchanged to any design, enabling rapid experiments with different structures to improve detection performance.

The experiment compared the two network architectures provided by FrRCNN (FrRCNN (ZF) and FrRCNN (VGG16)). Using default parameters to learn PASCAL VOC challenge exceeds 50% of the tracking framework.

Insert picture description here
Comparing FrRCNN and ACF, it is found that the quality of the detector has a great influence on the tracking effect.

2. Evaluation Model

We use a linear constant velocity model that does not depend on other objects and camera motion to approximate the inter-frame displacement of each object. Each target state is modeled as: x = [u, v, s, r, u ˙, v ˙, s ˙] T x = [u,v,s,r,\dot u,\dot v,\dot s]^Tx=[ u ,v ,s,r,u˙,v˙,s˙]T

uu uvvv represents the horizontal and vertical pixel coordinates of the target center, the ratiosss andrrr represents the ratio (area) and aspect ratio of the target bounding box, and the rest represent their velocity components. The aspect ratio should be constant. When the detection is associated with the target, the detected bounding box is used to update the target state, and the velocity component is optimally resolved through the Kalman filter framework. If no detection is associated with the target, its state is simply predicted without modification using the linear velocity model.

3. Data Association

In the process of assigning the detection results to the target, the bounding box geometry of the target is estimated by predicting the new position of the target in the current frame. The distribution cost matrix is ​​calculated from the IoU distance of each predicted bounding box of all existing targets , Use the Hungarian algorithm to match, and match the rejection whose IoU is less than the threshold, which can effectively solve the short-range occlusion.

4. Create and delete tracking ID

The target whose IoU is less than the threshold uses the current bounding box to create a new trajectory ID, and the initial velocity is set to 0. In addition, the new tracker will go through a trial phase, and the target needs to be associated with detection data to accumulate sufficient evidence to prevent tracking false positives.

When the track T_loss frame does not appear, this terminates the track. This experiment is set to 1 or 2. Because the uniform velocity model is not reliable compared to dynamic changes, and we mainly focus on inter-frame tracking, target re-identification is beyond the scope of this experiment. In addition, early deletion of lost targets can help improve efficiency. If an object reappears, tracking will implicitly resume under a new identity.

Three, summary

This paper proposes an online tracking framework based on inter-frame prediction and correlation. We show that the quality of tracking is highly dependent on detection performance. Using recent developments in detection, SOTA can be achieved just by using classic tracking. The simplicity of the proposed framework makes it suitable as a baseline, allowing new methods to focus on object re-identification to deal with long-term occlusion.

Guess you like

Origin blog.csdn.net/qq_41214679/article/details/110276301