【MOT】FairMOT multi-target tracking 2021 (installation + code interpretation)


提示:这里可以添加本文要记录的大概内容:

论文
FairMOT: On the Fairness of Detection and Re-Identification in Multiple Object Tracking**](http://arxiv.org/abs/2004.01888)

Code: https://github.com/ifzhang/FairMOT


0. Installation steps

1. First create a virtual environment and install the corresponding package

conda create -n FairMOT
conda activate FairMOT
conda install pytorch==1.7.0 torchvision==0.8.0 cudatoolkit=10.2 -c pytorch
cd ${
    
    FAIRMOT_ROOT}
pip install cython
pip install -r requirements.txt

2. Compile DCNv2 (deformable convolution)

Here you must use pytorch1.7.1 or 1.7.0. Because the dcnv2 compiled code provided by the author is based on
the clone link of these two torch versions: https://github.com/ifzhang/DCNv2/tree/pytorch_1.7

git clone -b pytorch_1.7 https://github.com/ifzhang/DCNv2.git
cd DCNv2
./make.sh

3. Install ffmppeg (for testing)

ffmpeg is mainly used to edit continuous frames of pictures and videos to achieve conversion:
1. Download code: http://ffmpeg.org/download.html
2. Install yasm

 sudo apt-get install yasm

3. Install sdl1.2

sudo apt-get install libsdl1.2-dev

4. Install sdl2.0

sudo apt-get install libsdl2-dev

5. Compile and install ffmpeg

Go to the ffmpeg folder after decompression, and execute the following commands in sequence:

 ./configure
make
sudo make install

6. Test whether the installation is successful

ffmpeg -version
ffplay -version

7. ffmpeg is simple to use

ffmpeg extract video frame https://blog.csdn.net/weixin_43804210/article/details/107964643
ffmpeg compress video https://blog.csdn.net/weixin_43804210/article/details/108109386

#. Continuous frame image conversion to video, you can also use opencv-python. You can refer to the link:
https://blog.csdn.net/kxh123456/article/details/121692474
(This method may read consecutive frames of images out of order, and you need to add a sorting code yourself. If you want the complete code, you can send a private message I)

1. Read the first frame

The code is as follows (example) track.py line55:

    for path, img, img0 in dataloader:
        timer.tic()
        blob = torch.from_numpy(img).cuda().unsqueeze(0)              # ([1, 3, 608, 1088])
        online_targets = tracker.update(blob, img0)

online_targets has obtained 56 results, followed by the result visualization. Let's see what tracker.update does.

1. Detection

There are mainly three results, hm (heat map, 2-dimensional), wh (2-dimensional), id_feature (512-dimensional), reg (offset, 2-dimensional) on the 4-fold downsampling feature
map. Then after post-processing and other operations, 56 detection targets were obtained

output = self.model(im_blob)[-1]          # (1,1,152,272 ) (1,2,152,272 ) (1,512,152,272 ) (1,2,152,272 )
 hm = output['hm'].sigmoid_()
wh = output['wh']
id_feature = output['id']
id_feature = F.normalize(id_feature, dim=1)

reg = output['reg'] if self.opt.reg_offset else None
dets, inds = mot_decode(hm, wh, reg=reg, cat_spec_wh=self.opt.cat_spec_wh, K=self.opt.K)   # (1,128,6) (1,128)
        
dets = self.post_process(dets, meta)                 # (1,128,6)     -->  dets[1]: ( 128,5 )
dets = self.merge_outputs([dets])[1]

2. Tracking

Go through the following 2 steps in sequence.

Step 2: First association, with embedding
Step 3: Second association, with IOU

Since self.tracked_stracks and self.lost_stracks were empty before, the above two steps are omitted, and a new target is directly created

        for inew in u_detection:
            track = detections[inew]
            if track.score < self.det_thresh:
                continue
            track.activate(self.kalman_filter, self.frame_id)
            activated_starcks.append(track)

The u_detection and detections here are the detection results just now. The activate code is as follows:

    def activate(self, kalman_filter, frame_id):
        """Start a new tracklet"""
        self.kalman_filter = kalman_filter
        self.track_id = self.next_id()
        self.mean, self.covariance = self.kalman_filter.initiate(self.tlwh_to_xyah(self._tlwh))   # (8) (8,8)

        self.tracklet_len = 0
        self.state = TrackState.Tracked          # 1
        if frame_id == 1:
            self.is_activated = True
        #self.is_activated = True
        self.frame_id = frame_id
        self.start_frame = frame_id

Then there is visualization. Here the detection result of the first frame starts to track, the newly created tracker : state=True, and is_activated (* The new detection result will not be activated until later, pay attention here *).

2. Second frame

1. Detection of the second frame

It is omitted here, and the code is the same as above. The detection got 57 results: dets , detections , inds

2. Tracking

    unconfirmed = []
    tracked_stracks = []  # type: list[STrack]
    for track in self.tracked_stracks:              # 上一帧的56个结果
        if not track.is_activated:
            unconfirmed.append(track)               # 不执行
        else:
            tracked_stracks.append(track)           # 执行

Step 2: First association, with embedding : Feature distance between targets

1.dists = matching.embedding_distance(strack_pool, detections)

    strack_pool = joint_stracks(tracked_stracks, self.lost_stracks)
    # Predict the current location with KF
    #for strack in strack_pool:
        #strack.predict()
    STrack.multi_predict(strack_pool)          # 更新卡尔曼滤波参数(更新后的mean conv等于旧的)
    dists = matching.embedding_distance(strack_pool, detections)   
    # 计算56个跟踪结果和57个检测结果,对应特征的余弦距离。计算代码如下:
matching.***embedding_distance***:计算跟踪目标与新检测目标的余弦距离:
def embedding_distance(tracks, detections, metric='cosine'):
    cost_matrix = np.zeros((len(tracks), len(detections)), dtype=np.float)
    det_features = np.asarray([track.curr_feat for track in detections], dtype=np.floa   # (57,512)
    track_features = np.asarray([track.smooth_feat for track in tracks], dtype=np.float) # (56,512)
    cost_matrix = np.maximum(0.0, cdist(track_features, det_features, metric))  # Nomalized features
    return cost_matrix                # (5657

2.dists = matching.fuse_motion(self.kalman_filter, dists, strack_pool, detections)

def fuse_motion(kf, cost_matrix, tracks, detections, only_position=False, lambda_=0.98):
    if cost_matrix.size == 0:
        return cost_matrix
    gating_dim = 2 if only_position else 4            # 4
    gating_threshold = kalman_filter.chi2inv95[gating_dim]           # 9.48
    measurements = np.asarray([det.to_xyah() for det in detections])    # ( 57,4 )
    for row, track in enumerate(tracks):
        gating_distance = kf.gating_distance(
            track.mean, track.covariance, measurements, only_position, metric='maha')
        # 门距离,根据预测值(跟踪器)与观测值(检测结果)的位置和距离计算,维度(57
        
        cost_matrix[row, gating_distance > gating_threshold] = np.inf
        cost_matrix[row] = lambda_ * cost_matrix[row] + (1 - lambda_) * gating_distance
        # 按照0.980.02的权重分配: 余弦距离与gating_distance
    return cost_matrix                                      #(56, 57)

3.matches, u_track, u_detection = matching.linear_assignment ( dists, thresh=0.7) There are 56
matches , no u_track , no u_detection

def linear_assignment(cost_matrix, thresh):
    matches, unmatched_a, unmatched_b = [], [], []
    cost, x, y = lap.lapjv(cost_matrix, extend_cost=True, cost_limit=thresh) 
    # cost:1.63 x:(56) y:(57)
    
    for ix, mx in enumerate(x):
        if mx >= 0:
            matches.append([ix, mx])
    unmatched_a = np.where(x < 0)[0]        # 无
    unmatched_b = np.where(y < 0)[0]        # [56]:第56个值为-1,没有找到匹配目标
    matches = np.asarray(matches)
    return matches, unmatched_a, unmatched_b     # (562) ()  [56]

lap.lapjv : Do Hungarian matching. The input is an n*m score matrix, and three values ​​are returned:
c: The cost of the assignment (the smaller the better), if return_cost is False, it will not be returned.
x: An array of size n, used to specify which column each row is assigned to.
y: an array of size m, used to specify which row each column is assigned to.
5.

  for itracked, idet in matches:                        # 循环56个已匹配目标
      track = strack_pool[itracked]
      det = detections[idet]
      if track.state == TrackState.Tracked:
          track.update(detections[idet], self.frame_id)  # 根据匹配结果,更新卡尔曼滤波(包括特征值)
          activated_starcks.append(track)
      else:
          track.re_activate(det, self.frame_id, new_id=False)
          refind_stracks.append(track)



Step 3: Second association, with IOU

u_track is empty, so r_tracked_stracks is empty and dists is empty.

    detections = [detections[i] for i in u_detection]           # ( 56 ) * OT_O_(0-0)
    r_tracked_stracks = [strack_pool[i] for i in u_track if strack_pool[i].state == TrackState.Tracked]   # []
    dists = matching.iou_distance(r_tracked_stracks, detections)                # []
    matches, u_track, u_detection = matching.linear_assignment(dists, thresh=0.5)    # [] () (0)

Step 3.5:Deal with unconfirmed tracks

Did nothing because unconfirmed is empty

 detections = [detections[i] for i in u_detection]                         # 【56
 dists = matching.iou_distance(unconfirmed, detections)        # unconfirmed为[],dists为[]
 matches, u_unconfirmed, u_detection = matching.linear_assignment(dists, thresh=0.7)    # [] () (0)

Step 4: Init new stracks

Create a tracker for a target in u_detection .

    for inew in u_detection:
        track = detections[inew]
        if track.score < self.det_thresh:
            continue
        track.activate(self.kalman_filter, self.frame_id)
        activated_starcks.append(track)

Step 5: Update state
is not executed because self.lost_stracks is empty

3. Update the entire tracker

        self.tracked_stracks = [t for t in self.tracked_stracks if t.state == TrackState.Tracked]   # 56个(之前的)
        self.tracked_stracks = joint_stracks(self.tracked_stracks, activated_starcks)                   # 57个(加上新的检测)
        self.tracked_stracks = joint_stracks(self.tracked_stracks, refind_stracks)                          # 无增加:refind_stracks为空
        self.lost_stracks = sub_stracks(self.lost_stracks, self.tracked_stracks)                        # 无
        self.lost_stracks.extend(lost_stracks)                                                                                         # 无
        self.lost_stracks = sub_stracks(self.lost_stracks, self.removed_stracks)                     # 无
        self.removed_stracks.extend(removed_stracks)                                                                   # 无
        self.tracked_stracks, self.lost_stracks = remove_duplicate_stracks(self.tracked_stracks, self.lost_stracks)
       
        output_stracks = [track for track in self.tracked_stracks if track.is_activated]        # (56)个

3. The third frame

The detection result is 55 targets

        detections = [STrack(STrack.tlbr_to_tlwh(tlbrs[:4]), tlbrs[4], f, 30) for
                                  (tlbrs, f) in zip(dets[:, :5], id_feature)]                    # 55
                                  
        unconfirmed = []
        tracked_stracks = []  # type: list[STrack]
        for track in self.tracked_stracks:                          # 上一帧57个目标
            if not track.is_activated:
                unconfirmed.append(track)                         # 一个目标
            else:
                tracked_stracks.append(track)                  # 上一帧56个目标

Step 2: First association, with embedding

        strack_pool = joint_stracks(tracked_stracks, self.lost_stracks)    # 56

        STrack.multi_predict(strack_pool)
        dists = matching.embedding_distance(strack_pool, detections)       # ( 56,55)
        #dists = matching.gate_cost_matrix(self.kalman_filter, dists, strack_pool, detections)
        dists = matching.fuse_motion(self.kalman_filter, dists, strack_pool, detections)
        matches, u_track, u_detection = matching.linear_assignment(dists, thresh=0.7)   # (552)  [52]  []

Then process matches , u_track , and u_detection in sequence to
update the matched 55 trackers:

        for itracked, idet in matches:
            track = strack_pool[itracked]
            det = detections[idet]
            if track.state == TrackState.Tracked:
                track.update(detections[idet], self.frame_id)
                activated_starcks.append(track)                                    # 55else:
                track.re_activate(det, self.frame_id, new_id=False)
                refind_stracks.append(track)                                           # 无

Step 3: Second association, with IOU

Calculate the iou distance between the extra (unpaired) detection results and the extra (unpaired) trackers:

        detections = [detections[i] for i in u_detection]             # 空:多出(未配对)的检测结果
        r_tracked_stracks = [strack_pool[i] for i in u_track if strack_pool[i].state == TrackState.Tracked]   # u_track:[52]
        dists = matching.iou_distance(r_tracked_stracks, detections)                # []
        matches, u_track, u_detection = matching.linear_assignment(dists, thresh=0.5)    # () (0) ()

Handle unpaired trackers (tracking results do not match detection results)

        for it in u_track:
            track = r_tracked_stracks[it]                              # 上一帧未匹配的那一个跟踪器
            if not track.state == TrackState.Lost:             # 执行
                track.mark_lost()                                               # track.state = 2
                lost_stracks.append(track)                           # lost_stracks由 [] 增加一个

Process the tracker that was not activated in the previous frame (that is, the detection result that is not paired with the first frame (56) in the second frame (57))

        detections = [detections[i] for i in u_detection]                                                # []
        dists = matching.iou_distance(unconfirmed, detections)                            # []
        matches, u_unconfirmed, u_detection = matching.linear_assignment(dists, thresh=0.7)    # () (0) ()
      
        for it in u_unconfirmed:
            track = unconfirmed[it]
            track.mark_removed()                                      # track.state = 1  -->  =3 状态转为3
            removed_stracks.append(track)                  # 由[] 添加1

Step 4: Init new stracks: Since u_detection is empty, do not execute
Step 5: Update state Since self.lost_stracks is empty, do not execute

update the entire tracker

        self.tracked_stracks = [t for t in self.tracked_stracks if t.state == TrackState.Tracked]   # 55
        self.tracked_stracks = joint_stracks(self.tracked_stracks, activated_starcks)                   # 55
        self.tracked_stracks = joint_stracks(self.tracked_stracks, refind_stracks)                          # 55个 无增加:refind_stracks为空
        self.lost_stracks = sub_stracks(self.lost_stracks, self.tracked_stracks)                        # 无
        self.lost_stracks.extend(lost_stracks)                                                                                         # 增一个:未匹配的跟踪[OT_53_(1-2)]
        self.lost_stracks = sub_stracks(self.lost_stracks, self.removed_stracks)                     # 无增加,还是1
        self.removed_stracks.extend(removed_stracks)                                                                   # 一个:2帧中多出来的检测
        self.tracked_stracks, self.lost_stracks = remove_duplicate_stracks(self.tracked_stracks, self.lost_stracks)
        # (55)(1
        
        output_stracks = [track for track in self.tracked_stracks if track.is_activated]        # (55)个

4. The fourth frame

The detection result is 55 targets.
Before each tracking starts, initialize first:

        self.frame_id += 1
        activated_starcks = []
        refind_stracks = []
        lost_stracks = []
        removed_stracks = []

unconfirmed has no, tracked_stracks has 55:

        unconfirmed = []
        tracked_stracks = []  
        for track in self.tracked_stracks:
            if not track.is_activated:
                unconfirmed.append(track)
            else:
                tracked_stracks.append(track)

Step 2: First association, with embeddig

strack_pool has 56 (added self .lost_stracks from previous frame )

        strack_pool = joint_stracks(tracked_stracks, self.lost_stracks)    # 56

        STrack.multi_predict(strack_pool)
        dists = matching.embedding_distance(strack_pool, detections)       # ( 56,55)
        dists = matching.fuse_motion(self.kalman_filter, dists, strack_pool, detections)    # ( 56,55)
        matches, u_track, u_detection = matching.linear_assignment(dists, thresh=0.7)   # (552)  [34]  []

55 trackers that handle pairings:
activated_starcks : 54, refind_stracks : 1 ([53])

        for itracked, idet in matches:
            track = strack_pool[itracked]
            det = detections[idet]
            if track.state == TrackState.Tracked:
                track.update(detections[idet], self.frame_id)
                activated_starcks.append(track)
            else:
                track.re_activate(det, self.frame_id, new_id=False)
                refind_stracks.append(track)                                                 # 上一帧的self.lost_stracks

Step 3: Second association, with IOU

detections is empty, u_track has 1, r_tracked_stracks 1:

        detections = [detections[i] for i in u_detection]           # ( 56 ) * OT_O_(0-0)
        r_tracked_stracks = [strack_pool[i] for i in u_track if strack_pool[i].state == TrackState.Tracked]   # []
        dists = matching.iou_distance(r_tracked_stracks, detections)                # []
        matches, u_track, u_detection = matching.linear_assignment(dists, thresh=0.5)    # [] (0) ()

For the tracker of u_track (to match the detection result), mark it as lost:

        for it in u_track:
            track = r_tracked_stracks[it]
            if not track.state == TrackState.Lost:
                track.mark_lost()
                lost_stracks.append(track)

Deal with unconfirmed tracks, usually tracks with only one beginning frame
Step 4: Init new stracks

Because there is no u_detection , skip this step and go directly to step 5:

Step 5: Update state

self.lost_stracks is the unmatched tracker from the previous frame, not just lost_stracks :

       for track in self.lost_stracks:
            if self.frame_id - track.end_frame > self.max_time_lost:        # 判断是否丢失超过15
                track.mark_removed()
                removed_stracks.append(track)                                                   # 未执行

update the entire tracker

        self.tracked_stracks = [t for t in self.tracked_stracks if t.state == TrackState.Tracked]   # 54
        self.tracked_stracks = joint_stracks(self.tracked_stracks, activated_starcks)                  # 54
        self.tracked_stracks = joint_stracks(self.tracked_stracks, refind_stracks)                         # 55
        
        self.lost_stracks = sub_stracks(self.lost_stracks, self.tracked_stracks)                              #     []
        self.lost_stracks.extend(lost_stracks)                                                                                               # 1个:【35
        self.lost_stracks = sub_stracks(self.lost_stracks, self.removed_stracks)                           # 1个:【35
        self.removed_stracks.extend(removed_stracks)                                                                         #  1[57]
        self.tracked_stracks, self.lost_stracks = remove_duplicate_stracks(self.tracked_stracks, self.lost_stracks) # (55) [35]
        # get scores of lost tracks
        output_stracks = [track for track in self.tracked_stracks if track.is_activated]               # [55]

Because the self.lost_stracks of the previous frame has already matched in this frame, so it becomes refind_stracks

5. Frame 5

The detection result is 56 targets,
the last frame tracker is 55, and unconfirmed is 0

        unconfirmed = []
        tracked_stracks = []  
        for track in self.tracked_stracks:
            if not track.is_activated:
                unconfirmed.append(track)
            else:
                tracked_stracks.append(track)

Step 2: First association, with embedding

        strack_pool = joint_stracks(tracked_stracks, self.lost_stracks)    # 56

        STrack.multi_predict(strack_pool)
        dists = matching.embedding_distance(strack_pool, detections)            # ( 56,56)
        dists = matching.fuse_motion(self.kalman_filter, dists, strack_pool, detections)     # ( 56,56)
        matches, u_track, u_detection = matching.linear_assignment(dists, thresh=0.7)    # (552)  [55]  [48]

Update the 55 matches that have been matched :

        for itracked, idet in matches:
            track = strack_pool[itracked]
            det = detections[idet]
            if track.state == TrackState.Tracked:
                track.update(detections[idet], self.frame_id)
                activated_starcks.append(track)                                               # 55else:
                track.re_activate(det, self.frame_id, new_id=False)
                refind_stracks.append(track)                                                     # 【】

Step 3: Second association, with IOU

        detections = [detections[i] for i in u_detection]           #  OT_O_(0-0)
        r_tracked_stracks = [strack_pool[i] for i in u_track if strack_pool[i].state == TrackState.Tracked]    # []
        dists = matching.iou_distance(r_tracked_stracks, detections)                # []
        matches, u_track, u_detection = matching.linear_assignment(dists, thresh=0.5)    # [] () (0)

'Deal with unconfirmed tracks, usually tracks with only one beginning frame

        detections = [detections[i] for i in u_detection]                                                      # 1
        dists = matching.iou_distance(unconfirmed, detections)                                 # []
        matches, u_unconfirmed, u_detection = matching.linear_assignment(dists, thresh=0.7)    # [] () (0)
        for itracked, idet in matches:                                                                                          # 不执行
            unconfirmed[itracked].update(detections[idet], self.frame_id)
            activated_starcks.append(unconfirmed[itracked])
        for it in u_unconfirmed:                                                       
            track = unconfirmed[it]
            track.mark_removed()           # track.state = 1  --> track.state = 3
            removed_stracks.append(track)

Step 4: Init new stracks

activated_starcks is incremented by 1 and its state = 1:

        for inew in u_detection:
            track = detections[inew]
            if track.score < self.det_thresh:
                continue
            track.activate(self.kalman_filter, self.frame_id)
            activated_starcks.append(track)

update the entire tracker

        self.tracked_stracks = [t for t in self.tracked_stracks if t.state == TrackState.Tracked]       # 55
        self.tracked_stracks = joint_stracks(self.tracked_stracks, activated_starcks)                       # 56
        self.tracked_stracks = joint_stracks(self.tracked_stracks, refind_stracks)                              # 55
        self.lost_stracks = sub_stracks(self.lost_stracks, self.tracked_stracks)
        self.lost_stracks.extend(lost_stracks)
        self.lost_stracks = sub_stracks(self.lost_stracks, self.removed_stracks)
        self.removed_stracks.extend(removed_stracks)
        self.tracked_stracks, self.lost_stracks = remove_duplicate_stracks(self.tracked_stracks, self.lost_stracks)
        # get scores of lost tracks
        output_stracks = [track for track in self.tracked_stracks if track.is_activated]               # 55个(其中有个跟踪器,状态是未激活的)

Guess you like

Origin blog.csdn.net/qq_45752541/article/details/124022426
Recommended