【Paper Interpretation】Hybrid-SORT: Weak Cues Matter for Online Multi-Object Tracking

Because the baseline of Hybrid-SORT is improved based on OCSORT, it is recommended to understand the relevant knowledge of byteTrack and [] before doing this.

1 Introduction

1.1 Basic framework

Multi-object tracking (MOT) divides the problem into two sub-tasks. The first task is to detect objects in each frame. The second task is to connect them in different frameworks. Association tasks are primarily solved by exploiting strong cues, either explicitly or implicitly, including spatial and appearance information.

1.2 Limitations of current methods

When two objects highly overlap in the current frame, the intersection (IoU) between the detected and estimated trajectory positions becomes blurred, and the appearance features of both objects are dominated by the foreground features.

2. Hybrid-SORT

The current state-of-the-art SORT-like algorithm OCSORT is modified as our strong baseline. First, the velocity direction modeling in OC-SORT is modified, that is, observation-centered momentum (OCM), extending the box center to four box corners and extending the fixed time interval to multiple time intervals; secondly, We added an additional low-confidence detection correlation stage after ByteTrack.

2.1 Weak condition modeling

2.1.1 Tracklet confidence modeling

 Two additional states are added: trajectory confidence c and its velocity component\dot{c}

As shown in the figure below, the Kalman Filter shows obvious lag when trying to estimate sudden changes in the confidence state, and the change trend of the confidence state shows obvious directionality

 Based on the above characteristics, this paper uses simple linear prediction based on trajectory history to estimate trajectory confidence .

The confidence cost is calculated as the absolute difference between  the trajectory confidence estimated according to Equation 4 \widehat{c}_{trk}and the detection confidencec_{it}

2.1.2 Height Modulated IoU(HMIOU)

Introducing the height state helps improve association:

(1) The height of an object reflects depth information to a certain extent , making the height state an effective clue for distinguishing highly overlapping objects.

(2) Secondly, the height state is highly robust to different postures, is an accurately estimated state, and is a high-quality representation of the object.

The formula is expressed as:

HIoU represents height status, which is a weak clue, while IoU represents spatial information, which is a strong clue. We use HIoU to modulate IoU to achieve enhanced recognition of occluded or clustered objects.

2.2 Hybrid-SORT

2.2.1 Robust OCM

2.2.1.1 Limitations of the original OCM

 The modeling of original OCM is susceptible to noise caused by fixed time intervals and sparse states (i.e. only target centers).

2.2.1.2 Robust OCM
  • First, extend the fixed time interval of 3 frames into the superposition of multiple time intervals from 1 to 3;
  • Second, we use the four corners of the object instead of its center point to calculate the velocity direction.

Avoid that due to sudden changes in attitude, the trajectory and the velocity direction from the trajectory to the detection center may be completely opposite, resulting in matching errors.

 

2.2.2 Appearance modeling

 Objects are detected first and then the resulting cropped patches are fed to the ReID model. We use exponential moving average (EMA) to model the trajectory map appearance information, and use cosine distance as a metric to calculate the similarity between the trajectory map appearance features and the detected appearance features.

2.2.3 Algorithm architecture

The association stage mainly includes three stages: the first stage is the association stage of high-confidence objects, the second stage is the association stage of low-confidence objects (BYTE in ByteTrack), and the third stage is to use the last detection to restore the lost track (OCR in OC-SORT).

3.Code

3.1 Kalman filter KalmanBoxTracker modeling

3.1.1 Introducing trajectory confidence c and its velocity component \dot{c}·

        if not orig:
          from .kalmanfilter_score_new import KalmanFilterNew_score_new as KalmanFilter_score_new
          self.kf = KalmanFilter_score_new(dim_x=9, dim_z=5)

3.1.2 Prediction of trajectory confidence

Simple linear prediction to estimate trajectory confidence

        if not self.confidence_pre:
            return self.history[-1], np.clip(self.kf.x[3], self.args.track_thresh, 1.0),
                    np.clip(self.confidence, 0.1, self.args.track_thresh)
        else:
            return self.history[-1], np.clip(self.kf.x[3], self.args.track_thresh, 1.0), 
                   np.clip(self.confidence - (self.confidence_pre - self.confidence), 0.1, self.args.track_thresh)

The return values ​​are the nine-bit prediction quantity, the confidence prediction value, and the speed component of the confidence \dot{c}.

3.2 Robust OCM

3.2.1 The four corners replace its center point

 lt, rt, lb, rb: represent the speed of the four corner points of the bbox

    Y1, X1 = speed_direction_batch_lt(detections, previous_obs)
    Y2, X2 = speed_direction_batch_rt(detections, previous_obs)
    Y3, X3 = speed_direction_batch_lb(detections, previous_obs)
    Y4, X4 = speed_direction_batch_rb(detections, previous_obs)
    cost_lt = cost_vel(Y1, X1, trackers, lt, detections, previous_obs, vdc_weight)
    cost_rt = cost_vel(Y2, X2, trackers, rt, detections, previous_obs, vdc_weight)
    cost_lb = cost_vel(Y3, X3, trackers, lb, detections, previous_obs, vdc_weight)
    cost_rb = cost_vel(Y4, X4, trackers, rb, detections, previous_obs, vdc_weight)

    angle_diff_cost = cost_lt + cost_rt + cost_lb + cost_rb

speed_direction_batch_XX is used to calculate the speed of four corner points

cost_vel is used to calculate the cost of a certain intersection speed

3.3 Height Modulated IoU(HMIOU)

def hmiou(bboxes1, bboxes2):
    """
    Height_Modulated_IoU
    """
    bboxes2 = np.expand_dims(bboxes2, 0)
    bboxes1 = np.expand_dims(bboxes1, 1)

    yy11 = np.maximum(bboxes1[..., 1], bboxes2[..., 1])
    yy12 = np.minimum(bboxes1[..., 3], bboxes2[..., 3])

    yy21 = np.minimum(bboxes1[..., 1], bboxes2[..., 1])
    yy22 = np.maximum(bboxes1[..., 3], bboxes2[..., 3])
    o = (yy12 - yy11) / (yy22 - yy21)

    xx1 = np.maximum(bboxes1[..., 0], bboxes2[..., 0])
    yy1 = np.maximum(bboxes1[..., 1], bboxes2[..., 1])
    xx2 = np.minimum(bboxes1[..., 2], bboxes2[..., 2])
    yy2 = np.minimum(bboxes1[..., 3], bboxes2[..., 3])
    w = np.maximum(0., xx2 - xx1)
    h = np.maximum(0., yy2 - yy1)
    wh = w * h
    o *= wh / ((bboxes1[..., 2] - bboxes1[..., 0]) * (bboxes1[..., 3] - bboxes1[..., 1])
        + (bboxes2[..., 2] - bboxes2[..., 0]) * (bboxes2[..., 3] - bboxes2[..., 1]) - wh)
    return (o)

Guess you like

Origin blog.csdn.net/weixin_50862344/article/details/132176269