[Deep Learning] yolov5+deepsort completes counting and pedestrian re-identification tracking


foreword

Pedestrian re-identification is one of the basic tasks of computer vision. First, there must be a detector (detector to detect the target), and then the detected target is sent to the tracker (tracker) to complete the discrimination and tracking of the same target. .
Based on this, we can use this technology for:
1. Calculation of single-camera traffic and people flow
2. But camera tracking (loitering detection)
3. Cross-camera tracking.
Obviously, task 3 is more difficult than task 1, and task 2 is a continuation of task 1, which requires the introduction of some other techniques.


1. Knowledge system

1.1 Prerequisites

The predecessor of DeepSort is the sort algorithm, and the core of the Sort algorithm is the Kalman filter algorithm and the Hungarian algorithm.
The function of the Kalman filter algorithm: it is the current series of motion variables to predict the motion variables at the next moment, but the first detection result is used to initialize the motion variables of the Kalman filter.
Hungarian Algorithm: To solve the assignment problem, it is to assign a group of detection frames to the frames predicted by Kalman, and let the frames predicted by Kalman find the detection frame that best matches itself to achieve the effect of tracking. The essence is to maintain a state matrix to solve the matching problem of the prediction frame.

1.2 Workflow of Sort

insert image description here

Detections are boxes detected by objects. Tracks is track information.
The workflow of the entire algorithm is as follows:
(1) Create the corresponding Tracks from the detected results of the first frame. Initialize the motion variable of the Kalman filter, and predict its corresponding frame through the Kalman filter.

(2) Perform IOU matching on the frame of the frame target detection and the frame predicted by Tracks in the previous frame, and then calculate the cost matrix (cost matrix, the calculation method is 1-IOU) through the IOU matching result.

(3) Use all the cost matrices obtained in (2) as the input of the Hungarian algorithm to obtain linear matching results. At this time, we get three results. The first is Tracks mismatch (Unmatched Tracks), we directly Delete the mismatched Tracks; the second is Detections mismatch (Unmatched Detections), we initialize such Detections as a new Tracks (new Tracks); the third is the successful pairing of the detection frame and the predicted frame, which shows that We successfully tracked the previous frame and the next frame, and updated the corresponding Tracks variable through the Kalman filter for the corresponding Detections.

(4) Repeat steps (2)-(3) until the end of the video frame.
In my impression, the prediction here is linear prediction, because the predictor is weak and prone to id switch problems, and only considers the correlation features of motion, not the appearance features. These problems will be optimized in deepsort.

1.3 deepsort

. The Deepsort algorithm adds matching cascade (Matching Cascade) and confirmation of the new trajectory (confirmed) on the basis of the sort algorithm. Tracks are divided into confirmed state (confirmed) and unconfirmed state (unconfirmed). The newly generated Tracks are unconfirmed; unconfirmed Tracks must be matched with Detections for a certain number of times (the default is 3) before they can be converted into confirmation state. Confirmed Tracks must be continuously mismatched with Detections for a certain number of times (default 30 times) before they will be deleted.
insert image description here
The workflow of the whole algorithm is as follows:

(1) Create the corresponding Tracks from the detected results of the first frame. Initialize the motion variable of the Kalman filter, and predict its corresponding frame through the Kalman filter. Tracks at this time must be unconfirmed.

(2) Perform IOU matching on the frame of the frame target detection and the frame predicted by Tracks in the first frame, and then calculate the cost matrix (cost matrix, the calculation method is 1-IOU) through the IOU matching result.

(3) Use all the cost matrices obtained in (2) as the input of the Hungarian algorithm to obtain linear matching results. At this time, we get three results. The first is Tracks mismatch (Unmatched Tracks), we directly Delete the mismatched Tracks (because this Tracks is in an uncertain state, if it is in a definite state, it can only be deleted after reaching a certain number of times (default 30 times) in a row); the second is Detections mismatch (Unmatched Detections), We initialize such Detections as a new Tracks (new Tracks); the third is that the detection frame and the predicted frame are successfully paired, which means that we have successfully tracked the previous frame and the next frame, and the corresponding Detections are passed through Kalman Filter updates its corresponding Tracks variable.

(4) Repeat steps (2)-(3) until the confirmed Tracks appear or the video frame ends.

(5) Predict the boxes corresponding to the Tracks of the confirmed state and the Tracks of the uncertain state through the Kalman filter. Cascade matching of the confirmed Tracks frame and Detections (previously, as long as the Tracks match, the appearance features and motion information of the Detections will be saved, the first 100 frames will be saved by default, and the appearance features and motion information will be cascaded with Detections Matching, this is because the Tracks and Detections of the confirmed state (confirmed) are more likely to match).

(6) There are three possible results after cascade matching. The first one, Tracks matching, such Tracks update their corresponding Tracks variables through Kalman filtering. The second and third types are the mismatch between Detections and Tracks. At this time, the previously unconfirmed Tracks and the mismatched Tracks will be matched with Unmatched Detections one by one for IOU matching, and then the cost matrix (cost matrix) will be calculated based on the IOU matching results. , and its calculation method is 1-IOU).

(7) Use all the cost matrices obtained in (6) as the input of the Hungarian algorithm to obtain linear matching results. At this time, we get three results. The first is Tracks mismatch (Unmatched Tracks), we directly Delete the mismatched Tracks (because this Tracks is in an uncertain state, if it is in a definite state, it can only be deleted after reaching a certain number of times (default 30 times) in a row); the second is Detections mismatch (Unmatched Detections), We initialize such Detections as a new Tracks (new Tracks); the third is that the detection frame and the predicted frame are successfully paired, which means that we have successfully tracked the previous frame and the next frame, and the corresponding Detections are passed through Kalman Filter updates its corresponding Tracks variable.

(8) Repeat steps (5)-(7) until the end of the video frame.

2. Practical application

The code is in my git warehouse: https://github.com/justinge/yolov5-deepsort
The readyme in it has to be looked at, and it can run through. Counting a single camera is easy.
It can also cooperate with my previous article: https://blog.csdn.net/weixin_40293999/article/details/127811380

Summarize

Regarding cross-camera tracking, I haven't figured out a good way yet, so let's leave a hole first.

Guess you like

Origin blog.csdn.net/weixin_40293999/article/details/128888841