[Paper Notes] Application of Adversarial Generative Network in Pedestrian Recognition

Person Transfer GAN to Bridge Domain Gap for Person Re-Identification

Person Transfer GAN to Bridge Domain Gap for Person Re-Identification

Summary

Although the performance of pedestrian re-identification (ReID) has been significantly improved, many challenging problems in real scenes have not been fully studied.
For example, complex scenes and lighting changes, changes in viewpoint and posture, and a large number of identities in the camera network . In order to promote research on conquering these issues, the article contributed a new data set msmt171 , which has many important features, such as:

(1) The original video was taken by a network of 15 cameras deployed in the door and outside scenes;
2) The video covers a long time and presents complex lighting changes;
3) It contains the most annotation marks at present, 4101 identities And 126,441 bounding boxes.

In addition, when training and testing on different data sets, the domain gap usually exists between the data sets, which essentially leads to severe performance degradation . This means that the available training data cannot be effectively used in new test areas.

In order to reduce the cost of annotating new training samples, we propose a meta-transformation-generated adversarial network to bridge the gap in this field.

Introduction

The Introduction section of the thesis briefly introduces the problems faced by pedestrian re-identification and the solutions (inspiration: Introduce which aspects of the problem are solved, not more BB).
The task of pedestrian re-recognition is to find and return the target person (probe person) from the massive data set (gallery set) collected by the camera network. Although it has performed well now, there are still two problems:

First, the existing public data sets are different from the data collected in real scenes. For example, current data sets either contain a limited number of identities or are used in restricted environments.

Another challenge is that there is a domain gap between different person ReID data sets, that is, training and testing on different person ReID data sets will cause severe performance degradation. For example, the model trained on CUHK03[20] only achieves a Rank-1 accuracy of 2.0% when tested on PRID[10]. (The author believes that there are many reasons for this phenomenon: the field gap may be caused by many reasons, such as different lighting conditions, resolution, humans, seasons, and backgrounds )

The contribution of the paper is summarized in three points:

  • MSTL data set;
  • PTGAN model;
  • Summarizes some problems of pedestrian re-identification;

Related work

Related work is nothing more than an introduction to what other people have done before. This article is divided into two parts. The first part introduces the related description of ReID, and the second part is about the related work of the GAN network. Go into details.

MSMT17 is a large-scale ReID data set, and the accuracy of some public data sets has been brushed up to a high level. The proposal of this data set further continues the development of ReID.
The MSMT17 data set has the following characteristics:

  • The data collection time is about 180 hours
  • There are 15 cameras in total, including 12 outdoor cameras and 3 indoor cameras
  • Pedestrian frame is completed by Faster RCNN machine standard
  • In the end, there are a total of 126441 bounding boxes with 4101 pedestrians

Now this data set has been made public. The link to organize the data set mentioned in the previous article contains the download address of this data set, and it should be available for download on the Internet now.
The following is an intuitive comparison of this data set and other data sets:
Insert picture description hereThe new features of the MSMT17 data set are shown in the following aspects:

  • More identities, bounding boxes and cameras;
  • Complex scenes and backgrounds;
  • Multiple periods will cause serious lighting changes;
  • A more reliable detection frame; the
    article also provides a more detailed data comparison:
    Insert picture description here

Person Transfer GAN(PTGAN)

Person Transfer GAN (PTGAN) is a GAN for the ReID problem proposed by the author. The biggest feature of this GAN is to realize the migration of the background domain under the premise of ensuring the pedestrian prospects as much as possible.

The article mentioned that this kind of migration work does not mean that people in one picture are pulled out and placed in the background of another picture. The design of PTGAN to perform the transition from a to b satisfies two constraints: style transfer and personal identity Keep .

The goal of style transfer is to learn style mapping functions between different personnel data sets . The purpose of personal identity keeping is to ensure that a person’s identity remains unchanged after the transfer . Since different transferred samples of a person are regarded as having the same person ID, the restriction on the person's identity is very important for the training of the person's ReID.

The loss function of the PTGAN network consists of two parts:
LPTGAN = LS tyle + λ 1 LID L_{PTGAN}=L_{Style}+λ_1L_{ID}LP T G A N=LStyle+λ1LID
Where LS tyle L_{Style}LStyleRepresents the generated style loss , or domain loss, which is whether the generated image looks like the new data set style. LID L_{ID}LIDRepresents the ID loss of the generated image , which is whether the generated image is the same person as the original image. λ 1 λ_1λ1It balances the weight of the two losses. The key below is to see how these two losses are defined.

First of all, the basis of PTGAN is CycleGAN, so the loss is not as good as the normal CycleGAN loss. First, the first part is LS tyle L_{Style}LStyle, This is the standard CycleGAN discriminant loss

L S t y l e = L G A N ( G , D B , A , B ) + L G A N ( G ‾ , D A , B , A ) + λ 2 L C y c ( G , G ‾ ) L_{Style}=L_{GAN}(G,D_B,A,B)+L_{GAN}(\overline{G},D_A,B,A)+λ_2L_{Cyc}(G,\overline{G}) LStyle=LG A N(G,DB,A,B)+LG A N(G,DA,B,A)+λ2LCyc(G,G)
LGAN L_ {GAN}LG A NRepresents standard adversarial loss , L cyc L_{cyc}LcycRepresents cycle consistency loss . About this part of the content, will be introduced in Cycle-GAN.
The above parts are the loss of normal CycleGAN, ensuring that the generated image and the domain of the expected data set are the same.

Another improvement of the paper is LID L_{ID}LID. In order to ensure that the foreground remains unchanged during the image migration process, first use PSPNet to perform a foreground segmentation on the image to obtain a mask area.

Traditional CycleGAN not for ReID task, so do not need to ensure that ID information foreground objects unchanged, this result is that prospects might like fuzzy quality is poor, even worse, the phenomenon is likely to change the appearance of pedestrians, such as clothes The color has changed, which is very undesirable for the ReID task.

In order to solve this problem, the paper proposes L_{ID} loss, the prospect extracted by PSPNet, this prospect is a mask, and the final ID loss is:
LID = E a ∼ pdata (a) [∣ ∣ (G (a) − a) ⊙ M (a) ∣ ∣ 2] + E b ∼ pdata (b) [∣ ∣ (G ¯ ¯ ¯ ¯ (b) − b) ⊙ M (b) ∣ ∣ 2] L_(ID)=E_(a∼ pdata(a)}[||(G(a)−a)⊙M(a)||_2]+E_{b∼pdata(b)}[||(G¯¯¯¯(b)−b) ⊙M(b)|| _2]LID=Eapdata(a)[(G(a)a)M(a)2]+Ebpdata(b)[(G¯¯¯¯(b)b)M(b)2]

Among them, M(a) and M(b) are two segmented foreground masks, and ID loss will restrict the pedestrian foreground to remain unchanged as much as possible during the migration process. The effect of the final conversion is shown in the following figure: It
Insert picture description herecan be seen that, intuitively, compared with the traditional CycleGAN, it can better guarantee the ID information of the pedestrian.

Experimental details

The network structure of PTGAN is similar to that in Cycle-GAN. For the generation network, a convolution with a step size of 2, 9 residual blocks and two fringe convolutions with a step size of -1/2 are designed. The discriminator network consists of two parts.
PatchGAN is part of it. PatchGAN classifies whether a 70×70 patch in the image is true or false.
The other part is to calculate the L 2 L_{2} between the transmitted image and the input image on the foreground personL2distance.
PTGAN uses Adam solver. For the generator network, the learning rate is set to 0.0002. The learning rate of the discriminator network is set to 0.0001. Finally, train PTGAN for 40 epochs.

to sum up

This paper is an attempt to use the GAN network in the pedestrian re-identification network. The designed PTGAN network ensures that the pedestrian ID information remains unchanged on the basis of style transfer. The data set provided by the article also contributes to the field of pedestrian re-identification.

Guess you like

Origin blog.csdn.net/qq_37747189/article/details/109953560