Paper Reading Notes Cross-Modality Paired-Images Generation for RGB-Infrared Person Re-Identification (Part 1)

This article was published in AAAI 2020, and it is a cross-modal pedestrian re-recognition category. It is a new idea to solve the ReID problem with GAN.
The author information of the paper is:
Insert picture description here

Summary

Due to the large modal difference between RGB images and IR images, RGB-Infrared pedestrian re-identification is very challenging. The key to solving the cross-modal ReID is to learn the alignment features between RGB and IR modalities. However, due to the lack of corresponding labels between each pair of RGB and IR images, most current methods will attempt to use set-level (set-level) alignment to reduce the difference between modalities. However, the establishment of alignment between the entire set may cause misalignment between individual cases, thereby limiting the performance of RGB-IR ReID , so this paper proposes a method of generating cross-modal paired images, comprehensively considering set-level Alignment with instance-level.

  1. The method described in the paper distinguishes between modality-specific and modality-invariant features ( here, modality-invariant features include posture, gender, clothing type, what to carry and other content information. Modal specific features include clothing / shoe color, Style information such as texture .) To achieve set-level alignment. Compared with the traditional method, the method of directly removing the specific features of the modals can effectively reduce the differences between the modalities.
  2. Considering a person's cross-modal unpaired image, this method uses the exchanged images to generate a cross-modal matched image. Using generated images, instance-level (instance level) alignment can be achieved by minimizing the distance between each pair of paired images.

Method motivation

Insert picture description here
Description of set-level and instance-level alignment. (a) There is a big gap between the RGB set and the IR set. (b) The existing set-level alignment method is to minimize the distance between the two modes, which may cause misalignment in some cases. © Our method first generates cross-modal paired-images. (d) The instance-level alignment is achieved by minimizing the distance between the images of each pair.

As shown in figure1 (b), they only focus on the overall set-level alignment, while ignoring the fine-grained instance-level alignment between the two images, which will lead to instance misalignment, which will affect performance. (Even if this case misalignment can be solved by labels, the labels of the train and test data sets are not shared in the ReID task, so simply matching the training method may not achieve the desired results in the test set)

introduction

Comparison between different networks:
Insert picture description here
(a) In the edge-photo task, we can get cross-modal paired images. By minimizing their distance in the feature space, we can easily reduce the cross-modal gap. (b) In the RGB-IR Re-ID task, we only have unpaired-images. The appearance changes caused by the cross-modal differences make the task more challenging. © This article can well generate images with a given image pair, which helps us improve RGB-IR recognition. (d, e) For example, CycleGAN and StarGAN failed to deal with this problem.

The method in this paper is inspired by the generation of cross-modal paired images in Figure2 (a). Through paired images, we can directly ** reduce the distance between paired images in the feature space ** to reduce the difference in instance level.

However, as shown in Figure2 (b), in the RGB-IR ReID task, all images are un-paired, because the two images were collected at different times, the RGB images were collected at daytime, and the IR images were Collected in the night. We can use the image migration model to transfer images from one modality to another, such as CycleGAN and StarGAN. However, these image migration models can only learn ** one-to-one ** mapping, and the requirement to migrate from IR to RGB is ** one-to-many ** (for example, the grayscale image in IR can be different in RGB Colors).

Under this influence, CycleGAN and StarGAN usually generate some noisy images and cannot be applied to ReID tasks. That is, as shown in Figure 2 (d, e), the images generated by CycleGAN and StarGAN are not satisfactory.

Paved for a long time, the author began to export dry goods

In order to solve the above problems, the paper proposes a Set-level and Instance-Level Alignment Re-ID (JSIA-ReID) which enjoys several merits

Insert picture description here

Network Architecture:

Insert picture description here
The framework proposed in this paper includes a cross-modal paired image generation module G and a feature alignment module F. G first decomposes the image into specific modal and invariant modal features, and then decodes the features exchanged. F first uses a modal invariant encoder for set-level alignment, and then minimizes the distance between each pair of images for instance-level alignment. Finally, by training two modules with ReID loss, we can learn modal alignment and identity recognition features simultaneously.
As shown in FIgure3, the framework proposed in this paper includes a cross-modal paired image generation module G and a feature alignment module F (to learn the alignment features at the collection level and instance level). The generation module G includes three encoders and two generation The three encoders distinguish the invariant features and unique features between the two modes . The decoder then takes modal invariant features and unique features as inputs. By decoding across-feature ***, the cross-modal pairing image as shown in Figure 2 © can be generated .
Insert picture description here
Finally, by training two modules with ReID loss, we can learn modal alignment and identity recognition features simultaneously.

The next JSIA-ReID framework structure and related concepts :

Detailed introduction about JSIA-ReID:

Cross-Modality Paired-Images Generation Module

Features Disentanglement.

Paired-Images Generation.

Reconstruction Loss.

Cycle-Consistency Loss.

GAN loss.

Feature Alignment Module

Set-Level Feature Alignment.

Instance-Level Feature Alignment.

Identity-Discriminative Feature Learning.

Published 134 original articles · praised 38 · 90,000 views +

Guess you like

Origin blog.csdn.net/rytyy/article/details/105289005