Pedestrian re-identification Reid (1): Person_reID_baseline_pytorch

Pedestrian re-identification Reid (1): Person_reID_baseline_pytorch



foreword

Recently, there is a need for personnel trajectory identification in the project. Traditionally, it is difficult to obtain data by using the mobile phone base station positioning method. Later, it was determined that the method of image recognition should be used. It is understood that REID technology can be used to achieve it.


1. Definition of reid

1. What is reid

The simple understanding is that we need to find other images of the pedestrian A in the image candidate set based on the image of the pedestrian A. Reid technology plays a very important role in practical scenarios.

Using reid technology, we can build pedestrian trajectories in a monitoring system and apply them to various downstream tasks. For example, in the community monitoring system, we lock prisoner A at a certain moment. According to the reid technology, we can automatically find out the picture of prisoner A in the whole community monitoring in the whole monitoring system, and determine his movement Trajectory, and finally to assist the police arrest. For another example, in smart business scenarios, we can use reid technology to describe each consumer's shopping mall trajectory and regional residence time, so as to optimize passenger flow and assist in product recommendation. .

The reid algorithm can be broken down into the following three steps:

  1. Feature extraction: given a query image (query image) and a large number of database images (gallery
    images), extract their semantic features. In this feature space, the distance between pictures of the same person is as small as possible, and the distance between pictures of different people is as large as possible. The current mainstream reid algorithm uses a deep convolutional neural network (CNN, such as ResNet50) to extract features.
  2. Distance calculation: After obtaining query features and database features (gallery
    features), calculate the distance between the query image and the database image. Usually use euclidean (euclidean), cosine (cosine) distance, etc.
  3. Sorting return: After obtaining the distance, we can use the sorting algorithm to sort the samples, and return the final sample by card distance threshold or K nearest neighbor method. Generally, the quick sort algorithm is used, and its complexity is
    O(NlogN), where N is the number of pictures in the database.

2、reid_baseline

reid_baseline (Person_reID_baseline_pytorch): reid_baseline is a small, friendly and powerful reid baseline based on pytorch. Its performance is comparable to the current best public method (powerful), supports fp16 precision training with 2GB video memory (small), and provides an 8-minute quick tutorial to get started with reid (novice friendly). The baseline was released by Dr. Zhezhong Zheng in 2017, and the number of github stars has exceeded 2k so far.

2. Preparation

1. Environment

Rely on the pytorch environment, which has been configured before, and can be reused here. For the configuration method, refer to the article: Image Recognition (2): anaconda configures the pytorch environment and runs yolov5

2、code

Project address: https://github.com/layumi/Person_reID_baseline_pytorch

3. Data

Download address Market-1501

Introduction to the dataset:
The Market-1501 dataset was collected on the campus of Tsinghua University, shot in summer, constructed and made public in 2015. It includes 32217 images of 1501 pedestrians captured by 6 cameras (5 HD cameras and 1 Low HD camera). The image resolution is unified to 128X64. Each pedestrian is captured by at least 2 cameras and may have multiple images in one camera.
The training set bounding_box_train has 751 people, including 12,936 images, with an average of 17.2 training data per person; the
test set bounding_box_test has 750 people, including 19,732 images, with an average of 26.3 test data per person; the
query set query has 3368 query images .
Both the fixed number of training and testing sets provided by this dataset can be used in single-shot or multi-shot testing settings.
Reference article Pedestrian Re-identification Market1501 Dataset Introduction

3. Training

1. Generate training data

After the MARK data set is downloaded and decompressed, the file distribution is as follows:
insert image description here
To prepare the training data, you need to pass prepare.py, change the address in the fifth line to your local address
insert image description here
and then run the prepare.py file, a pytorch folder will be generated and entered
insert image description here
into the pytorch folder, the file The distribution is as follows:
insert image description here
Now we have successfully prepared images for later training.

2. Start training

We can enter the following command to start training:

python train.py --gpu_ids 0 --name ft_ResNet50 --train_all --batchsize 32  --data_dir your_data_path

after modification

python train.py --gpu_ids 0 --name ft_ResNet50 --train_all --batchsize 32  --data_dir ./Market/pytorch/

The default training is 60 generations, and the default parameters in the train.py file can be modified
insert image description here

4. Test

1. Feature extraction

In this part, we load the model we just trained to extract the visual features of each image

python test.py --gpu_ids 0 --name ft_ResNet50 --test_dir your_data_path  --batchsize 32 --which_epoch 59

after modification

python test.py --gpu_ids 0 --name ft_ResNet50 --test_dir ./Market/pytorch/  --batchsize 32 --which_epoch 59

–gpu_ids which gpu to run.

–name the dir name of the trained model.

–batchsize batch size.

–which_epoch select the i-th model.

–data_dir the path of the testing data.

2. Evaluation

Now we have the features for each image. All we need to do is match images with features.

python evaluate_gpu.py

mAP:0.7
insert image description here

5. Simple visualization

visualize the results,

python demo.py --query_index 600

insert image description here
–query_index which query you want to test. You may select a number in the range of 0 ~ 3367.

insert image description here
The picture above shows the 10 most similar pictures, and there are many applicable scenarios. In the actual personnel trajectory application, the behavior trajectory of the target personnel can be described by the output camera number and shooting time.

6. Summary

Follow the tutorial to experience reid_baseline. This step is relatively simple. I will record it. I am learning other reids and hope to make breakthroughs in the future.

Guess you like

Origin blog.csdn.net/h363924219/article/details/124324690