Basic introduction to pedestrian re-identification

Basic introduction to pedestrian re-identification (ReID)

Preface

Recently, I started studying in the field of deep learning, but I always look around and stay on the surface. I quickly forgot what I saw, and I threw aside everything I didn’t understand. In view of the academic and graduation pressure, I decided to start updating this blog to record and organize what I have learned on the one hand, and on the other hand. Also to urge myself to make progress every day.
This article is conducted after learning the course posted by Mr. Luo Hao on bilibili. For those who are ready to enter the field of pedestrian re-recognition, I recommend Mr. Luo Hao's course here. Although the release time is a few years ago, and some frameworks or knowledge have been updated, it is still a rare introductory material.
Link to Pedestrian Re-identification Course by Dr. Luo Hao, Zhejiang University

definition

Pedestrian re-identification (Person re-identification), also known as pedestrian re-identification, is widely regarded as asearch imageThe sub-problem of is to use computer vision technology to determine whether there is a specific pedestrian in the image or video, that is, given a monitored pedestrian image retrievalCross-deviceThe pedestrian image below. Pedestrian re-recognition technology can make up for the visual limitations of current fixed cameras, and can be combined with pedestrian detection and pedestrian tracking technology to be used in video surveillance, intelligent security and other fields.

Pedestrian Re-identification System

Figure Pedestrian Re-identification SystemA complete pedestrian re-identification system includes the following parts:
1. Data

  • The original video frame
    is the normal video image we get through the camera equipment. For example, if the police are tracing the escape route of the suspect, the original video frames are all surveillance videos around the crime scene.
  • Pedestrian image with retrieval
    refers to the image of the pedestrian we are looking for, which is input as Probe. If it is the chestnut above, the image of the pedestrian with the search is the image of the suspect.

2. Pedestrian re-identification system

  • Pedestrian detection is
    mainly used to detect the portraits that appear in the video. As a pedestrian re-recognition, the first thing to do is to be able to identify the pedestrians in the picture, which is called Gallery input. Of course, in the field of academic research, pedestrian re-identification is mainly concerned with the following part, and for pedestrian detection, the framework that has been designed is mostly used.
  • Pedestrian re-identification
    This part is the feature extraction of the above Probe and Gallery. Of course, the extraction method can be manual extraction or convolutional neural network extraction. Then, it measures the similarity of the pictures and sorts them according to the similarity graph.
    Figure 2 Pedestrian re-identificationIn terms of details, the pedestrian re-identification system includes the following parts:
  • Feature Extraction (feature Extraction): Learn the features that can cope with changes in people on different cameras.
  • Metric Learning: Map the learned features to a new space to make the same person closer and different people farther away.
  • Image Retrieval (Matching): Sort according to the distance between the image features and return the retrieval results.

data set

Data sets are usually pedestrian pictures obtained through manual annotation or detection algorithms. Currently, they are independent of detection and focus on recognition.
• The data set is divided into training set, validation set, Query, and Gallery.
• Model training is performed on the training set. Query and the image extraction feature
in the Gallery calculate the similarity , and for each query, find the top N similar images in the Gallery
The identity of the characters in training and testing is not repeated
Insert picture description hereExisting data sets can be divided into two categories:

Single frame data set

Single frame means that the collected pictures are single and discontinuous pictures. When labeling, a picture is an id.
Insert picture description here

Sequence data set

Compared with a single frame, we can see that the pictures in the sequence data set are all continuous actions. And different from a single frame picture, only a group of pictures is an id.
Insert picture description hereHere is a website by the way. This website summarizes the commonly used data set sites in the field of pedestrian re-recognition, but the most commonly used ones are the ones listed above: pedestrian re-recognition data sets .

Common evaluation indicators

1.
rank -k rank-k: In the ranking list returned by the algorithm, if the first k is the search target, it is called rank-k hit.
Insert picture description here2.
CMC curve Cumulative Match Characteristic (CMC) curve: calculates the hit rate of rank-k , Forming a rank-acc curve
Insert picture description here3. mAP curve

mAP (mean average precision): Reflects the degree to which all the correct images of the retrieved person are ranked in the front of the sort list in the database, which can more comprehensively measure the performance of the ReID algorithm.
Insert picture description here

Evaluation Mode

1.
Single shot vs multi shot Single shot means that the image of each person in the gallery is one (N=1), and multi shot means that the image of each person in the gallery is N>1 image. Under the same Rank-k, generally the larger N is, the higher the recognition rate is.
Insert picture description here2.
Single query vs multi query Single query means that the image of each person in the probe is one (N=1), while multi query means that the image of each person in the probe is N>1 image, and then N images are merged. The feature (maximum pooling or average pooling) is used as the final feature. Under the same Rank-k, generally the larger N is, the higher the recognition rate is.
Insert picture description here

Pedestrian re-identification method

Traditional method-manual design feature + distance measurement

  • Manual features:
    • Color space: RGB, HSV, LAB, XYZ, YCbCr, ELF, ELF16
    • Texture space: LBP, Gabor
    • Local features: SIFT, HOG, SURF
    • Special features: LDFV, ColorInv, SDALP, LOMO
  • Distance measure:
    • Common distances: Euclidean distance, Mahalanobis distance, cosine distance
    • Metric learning: LFDA, MFA, LMNN, LADF, XQDA, KISSME

Deep learning methods

Here is a summary of the pedestrian re-recognition method based on deep learning. According to the training loss, it can be divided into representation-based learning and metric learning; according to whether the feature considers local features, it can be divided into global feature-based and local feature-based features; according to different data can be divided It is a method based on a single frame image and a video sequence; in addition, there is a type of method based on GAN. (These methods will be mentioned later)

  • Representation-based learning
  • Metric-based learning method
  • Method based on local features
  • Method based on video sequence
  • Method based on GAN network
    Insert picture description here

Visualization

It is itself a kind of cluster analysis on the identified pictures.
Insert picture description hereOkay, this is the end of my blog here. Welcome friends who are studying this direction to communicate together.

Guess you like

Origin blog.csdn.net/qq_37747189/article/details/109551330