Learn the deepsort target tracking algorithm from scratch - detailed explanation of principles and code

Table of contents

1. Main steps of target tracking

2. The process of traditional sort algorithm

3.Deepsort algorithm process

4. Goal tracking overall code

4.1 In the Configs file directory:

4.2 Under the deep_sort/deep_sort/deep directory:

4.3 Under the deep_sort/deep_sort/sort directory:

Run the demo:


DeepSORT (Deep learning based SORT) is a visual target tracking algorithm based on deep learning, which combines deep learning and the traditional target tracking algorithm SORT (Simple Online and Realtime Tracking).

DeepSORT detects the target in each frame of image based on target detectors (such as YOLO, Faster R-CNN, etc.), uses multi-feature fusion (Muti-feature Fusion) technology to represent and describe the target, and then uses the SORT algorithm to detect the target. track. On the basis of the SORT algorithm, DeepSORT introduces the Re-IDentification (Re-ID) model to solve the problem of determining the target ID. The Re-ID model determines the unique ID of the target by calculating the similarity of the target in multiple frame images.

The advantages of the DeepSORT algorithm are: high accuracy, strong robustness, and good adaptability to target occlusion and deformation. It has been widely used in fields such as tracking of pedestrians, vehicles and other targets and intelligent video surveillance.

1. Main steps of target tracking

  1. Get original video frames
  2. Use an object detector to detect objects in video frames
  3. Extract the features in the frame of the detected target, which include appearance features (convenient for feature comparison to avoid ID switch) and motion features (movement features are convenient for prediction by Kalman filter)
  4. Calculate the matching degree of the target in the two frames before and after (using the Hungarian algorithm and cascade matching), and assign an ID to each tracked target.

2. The process of traditional sort algorithm

        The predecessor of Deepsort is the sort algorithm. The core of the sort algorithm is the Kalman filter algorithm and the Hungarian algorithm.

        Function of Kalman filter algorithm: The main function of this algorithm is to use the current series of motion variables to predict the motion variables of the next moment, but the first detection result is used to initialize the motion variables of the Kalman filter.

        The role of the Hungarian algorithm: Simply speaking, it solves the allocation problem, which is to allocate a group of detection frames and the frames predicted by Kalman, so that the frames predicted by Kalman can find the detection frame that best matches itself to achieve the tracking effect.

        The sort workflow is shown in the figure below:

 Detections are frames detected by the target. Tracks is track information.

(1) Create corresponding Tracks based on the detected results of the first frame. Initialize the motion variables of the Kalman filter and predict their corresponding frames through the Kalman filter.

(2) Match the frame detected by the target in this frame with the frame predicted by Tracks in the previous frame one by one, and then calculate the cost matrix (cost matrix, the calculation method is 1-IOU) through the IOU matching result.

(3) Use all the cost matrices obtained in (2) as the input of the Hungarian algorithm to obtain linear matching results. At this time, we get three types of results. The first is Tracks mismatch (Unmatched Tracks). We directly Delete the mismatched Tracks; the second one is Detections mismatch (Unmatched Detections), we initialize such Detections into a new Tracks (new Tracks); the third one is that the detection frame and the predicted frame are paired successfully, which means We tracked the previous frame and the next frame successfully, and updated their corresponding Tracks variables through Kalman filtering.

(4) Repeat steps (2)-(3) until the video frame ends.

3.Deepsort algorithm process

Since the sort algorithm is still a relatively rough tracking algorithm, it is particularly easy to lose its ID when an object is occluded. The Deepsort algorithm adds matching cascade and confirmation of new trajectories to the sort algorithm. Tracks are divided into confirmed state (confirmed) and unconfirmed state (unconfirmed). The newly generated Tracks are in unconfirmed state; unconfirmed Tracks must match Detections a certain number of times (default is 3) before they can be converted into Confirmation status. Confirmed Tracks must mismatch with Detections a certain number of times (default 30 times) before they will be deleted.

        The workflow of the Deepsort algorithm is shown in the figure below:

The workflow of the entire algorithm is as follows:

(1) Create corresponding Tracks based on the detection results of the first frame. Initialize the motion variables of the Kalman filter and predict their corresponding frames through the Kalman filter. The Tracks at this time must be unconfirmed.

(2) Match the frame detected by the target in this frame with the frame predicted by Tracks in the previous frame one by one, and then calculate the cost matrix (cost matrix, the calculation method is 1-IOU) through the IOU matching result.

(3) Use all the cost matrices obtained in (2) as the input of the Hungarian algorithm to obtain linear matching results. At this time, we get three types of results. The first is Tracks mismatch (Unmatched Tracks). We directly Delete the mismatched Tracks (because this Tracks is in an uncertain state. If it is in a determined state, it must be deleted after reaching a certain number of times (default 30 times)); the second type is Unmatched Detections. We initialize such Detections into a new Tracks (new Tracks); the third is that the detection frame and the predicted frame are successfully paired, which means that we have successfully tracked the previous frame and the next frame, and the corresponding Detections are passed through Kalman Filter and update its corresponding Tracks variable.

(4) Repeat steps (2)-(3) until confirmed Tracks appear or the video frame ends.

(5) Use Kalman filtering to predict the frames corresponding to the Tracks in the confirmed state and the Tracks in the unconfirmed state. Cascade matching the frames of confirmed Tracks and Detections (previously, every time Tracks were matched, the appearance features and motion information of Detections would be saved. The first 100 frames were saved by default, and the appearance features and motion information were used to cascade with Detections. Matching, this is done because confirmed Tracks and Detections are more likely to match).

(6) There are three possible results after performing cascade matching. The first is Tracks matching. Such Tracks update their corresponding Tracks variables through Kalman filtering. The second and third type is the mismatch between Detections and Tracks. At this time, the previously unconfirmed Tracks and mismatched Tracks are matched with Unmatched Detections one by one by IOU, and then the cost matrix is ​​calculated based on the IOU matching results. , its calculation method is 1-IOU).

(7) Use all the cost matrices obtained in (6) as the input of the Hungarian algorithm to obtain linear matching results. At this time, we get three types of results. The first is Tracks mismatch (Unmatched Tracks). We directly Delete the mismatched Tracks (because this Tracks is in an uncertain state. If it is in a determined state, it must be deleted after reaching a certain number of times (default 30 times)); the second type is Unmatched Detections. We initialize such Detections into a new Tracks (new Tracks); the third is that the detection frame and the predicted frame are successfully paired, which means that we have successfully tracked the previous frame and the next frame, and the corresponding Detections are passed through Kalman Filter and update its corresponding Tracks variable.

(8) Repeat steps (5)-(7) until the end of the video frame.
 

4. Goal tracking overall code

The following is an explanation of the functions of the important code for target tracking.

        First the code is divided into three parts:

  1. Related codes and weights for target tracking
  2. Target detection related codes and weights, the yolov5.5 target detection algorithm is used here
  3. Call detection and tracking code related py files

For the content of target detection, you can read other articles

Here we mainly explain the code part related to target tracking. The main py file is shown in the figure below: The main functions of each py file will be explained one by one.

The main function

4.1 In the Configs file directory:

 deep_sort.yaml: This yaml file mainly saves some parameters.

(1) The directory path containing the feature extraction weights;

(2) Maximum cosine distance, used for cascade matching, if it is greater than the threshold, it is ignored.

(3) Detection result confidence threshold

(4) Non-maximum suppression threshold, set to 1 to indicate no suppression

(5) Maximum IOU threshold

(6) Maximum lifetime, that is, if the object is not tracked after the MAX_AGE frame, the trajectory will be changed to a deleted state.

(7) The highest number of hits. If the number of hits is hit, the state will be converted from uncertain to determined.

(8) The maximum number of frames to save features. If the number of frames is exceeded, rolling saving will be performed.

4.2 Under the deep_sort/deep_sort/deep directory:

ckpt.t7: This is a weight file of the feature extraction network. This weight file will be generated after the feature extraction network is trained, which is convenient for extracting features in the target frame during target tracking and avoiding ID switch during target tracking.
evaluate.py: Calculate the accuracy of the feature extraction model.

feature_extractor.py: Extract the features in the corresponding bounding box and obtain a fixed-dimensional feature as a representative of the bounding box for use in calculating similarity.

model.py: Feature extraction network model, which is used to extract training feature extraction network weights.

train.py: python file for training feature extraction network

test.py: Test the performance of the trained feature extraction network

4.3 Under the deep_sort/deep_sort/sort directory:

detection.py: Saves a detection frame that passes target detection, as well as the confidence of the frame and the acquired features; it also provides conversion methods for various formats of the frame.

iou_matching.py: Calculate the IOU between two frames.

kalman_filter.py: The relevant code of the Kalman filter, which mainly uses Kalman filter to predict the trajectory information of the detection frame.

linear_assignment.py: Use the Hungarian algorithm to match the predicted trajectory frame and detection frame for the best matching effect.

nn_matching.py: Calculate the nearest collar distance by calculating Euclidean distance, cosine distance and other distances.

preprocessing.py: Non-maximum suppression code, which uses the non-maximum suppression algorithm to output the optimal detection frame.

track.py: Mainly stores trajectory information, including the position and speed information of the trajectory frame, the ID and status of the trajectory frame, of which there are three states, one is the determined state, the uncertain state, and the deleted state.

tracker.py: saves all trajectory information, is responsible for initializing the first frame, predicting and updating the Kalman filter, and is responsible for cascade matching and IOU matching.

deep_sort/deep_sort/deep_sort.py: The overall encapsulation of deepsort, achieving an overall effect of deepsort tracking.

deep_sort/utils: The main ones here include python codes for various tools, such as picture frame tools, log saving tools, etc.

Link: https://pan.baidu.com/s/1uORzJIav2z2SXMqaBfJ5pQ 
Extraction code: ztaw

Run the demo:

result

 The next chapter explains how to train your own feature extraction network

Guess you like

Origin blog.csdn.net/weixin_45303602/article/details/132721845