NVIDIA open DG-Net: produce high-quality images with a pedestrian GAN, auxiliary heavy pedestrian identification / line fitting

A few days ago NVIDIA the DG-Net revenue source. Let us look at this CVPR19 Oral papers.

Paper is NVIDIA (NVIDIA), Sydney University of Technology (UTS), Australian National University (ANU) researchers in oral report on CVPR19 article "Joint Discriminative and Generative Learning for Person Re-identification". Often require large amounts of data marked depth learning model training, but the large amount of data collection and labeling is often more difficult. The author explores the pedestrian re-identification method using the generated data to assist in training on this task. Generated by the pedestrian image quality, it is fused to heavy pedestrian recognition model, while improving the quality and accuracy of the generated pedestrian identification pedestrian weight.
Papers link: https://arxiv.org/abs/1904.07223
B station video: https://www.bilibili.com/video/av51439240/
Tencent Video: https://v.qq.com/x/page/t0867x53ady .html

Code: https://github.com/NVlabs/DG-Net

Here Insert Picture Description
</center>

Why: (pain points before the paper what?)

  • Generate high-quality image of a pedestrian has some difficulties. Before some of the work generated by the pedestrian is relatively low image quality (above). Mainly in two aspects: 1 produced fidelity: pedestrians not true, the image blur, true background; 2. require additional annotations to assist generation: the need for additional human skeleton label or attribute.
  • The use of these low-quality image generating pedestrian recognition model trained pedestrian weight, and introduce differences between the original data set (bias). Therefore, before work, or just put all of the generated image of pedestrians as outlier to the regular network; or extra - a training model to generate an image, and the original model to do integration; or is not fully generated images to train.
  • Meanwhile, since the difficulty of labeling the data set, identifying a pedestrian weight training set (e.g., Market and DukeMTMC-reID) the amount of data is generally about 2W, much smaller than the other data set ImageNet easily overfitting problem has not been solved.

What: (This paper presents what, to solve the problem)

  • No additional annotation (such as an attitude pose, property attribute, key keypoints, etc.), a pedestrian can generate high-quality images. By exchanging the extracted features to achieve the appearance of two images of a pedestrian interchanged. These changes in appearance are focused on training real, rather than random noise.
    Here Insert Picture DescriptionEncoded image exchange

  • Lifting member does not need to match the heavy pedestrian recognition result. It is just a sample model to see more training can enhance the effect of the model. Given N images, we first generate an NxN training images, these images to re-train the pedestrian recognition model. (Bottom row and the first column a first input a real image, the remaining image are generated)
    Here Insert Picture Description

  • Training in a cycle: generating an image re-feeding the pedestrian model learning good pedestrian identification feature, while the weight of the pedestrian model extracted features identified then will be fed to generate a model to improve the quality of the generated image.

How :( This article is how to achieve this goal)

  • Defined features:
    In this paper, we first define the two features. One for the physical features, one for the structural features. Appearance characteristics associated with a pedestrian ID, structural features associated with the low-level visual features.

Here Insert Picture Description
</center>

  • Generated section:
    1. With ID reconstruction: appearance code different photos of the same person should be the same. Below,
      we can have a loss of self-reconstruction (upward, similar to auto-encoder), can also be used with postive sample ID is constructed to generate an image. Here we use the L1 Loss pixel-level of.

Here Insert Picture Description
</center>

  1. 不同ID生成:
    这是最关键的部分。给定两张输入图像,我们可以交换他们的appearance 和 structure code来生成有意思的两个输出,如下图。对应的损失有: 维持真实性的GAN Loss, 生成图像还能重构出对应的a/s的特征重构损失。
    我们的网络中没有随机的部分,所以生成图像的变化都是来自训练集本身。故更接近原始的训练集。

Here Insert Picture Description
</center>

  • reID的部分:
    对于真实图像,我们仍旧使用分类的cross entropy loss。
    对于生成图像,我们使用了两个loss,一个为L{prime},通过训好的baseline模型当老师,来提供生成图像的soft label,最小化预测结果和老师模型的KL距离。另一个loss,来挖掘一些图像变了appearance后,仍保留的细节信息,为L{fine}。(具体细节可以见论文。)

Here Insert Picture Description
</center>

Results:

  • 定性指标:
    1. 外表互换,我们在三个数据集上测试了结果,可以看到对于遮挡/大的光照变化,我们的方法都相对鲁棒。

Here Insert Picture Description
</center>

  1. 外表插值。网络是不是记住了生成图像的样子。故我们做了一个逐渐改变appearance的实验,可以看到外表是逐渐并且smooth地改变的。

Here Insert Picture Description
</center>

  1. 失败的案例。不常见的图案如logo无法还原。

Here Insert Picture Description
</center>

  • 定量指标:
    1. 生成图像的真实度(FID)和多样性(SSIM)比较。FID越小越好,SSIM越大越好。

Here Insert Picture Description
</center>

  1. 在多个数据集上的reID结果 (Market-1501, DukeMTMC-reID, MSMT17, CUHK03-NP)。

Here Insert Picture Description
Here Insert Picture Description

附:视频Demo

<iframe src="//player.bilibili.com/player.html?aid=51439240&cid=90036752&page=1" scrolling="no" border="0" frameborder="no" framespacing="0" allowfullscreen="true"> </iframe>

B 站视频备份: https://www.bilibili.com/video/av51439240/
腾讯视频备份: https://v.qq.com/x/page/t0867x53ady.html

最后,感谢大家看完。因为我们也处在初步尝试和探索阶段,所以不可避免地会对一些问题思考不够全面。如果大家发现有不清楚的地方,欢迎提出宝贵意见并与我们一起讨论,谢谢!

参考文献

[1] Z. Zheng, L. Zheng, and Y. Yang. Unlabeled samples generated by gan improve the person re-identification baseline in vitro. ICCV, 2017.
[2] Y. Huang, J. Xu, Q. Wu, Z. Zheng, Z. Zhang, and J. Zhang. Multi-pseudo regularized label for generated samples in person reidentification. TIP, 2018.
[3] X. Qian, Y. Fu, T. Xiang, W. Wang, J. Qiu, Y. Wu, Y.-G. Jiang, and X. Xue. Pose-normalized image generation for person reidentification. ECCV, 2018.
[4] Y. Ge, Z. Li, H. Zhao, G. Yin, X. Wang, and H. Li. Fd-gan: Pose-guided feature distilling gan for robust person re-identification. In NIPS, 2018.

作者简介

本文的第一作者郑哲东是悉尼科技大学计算机科学学院的博士生,预计2021年 6 月毕业。该论文是其在英伟达实习期间的成果。

Zhengzhe Dong has already published eight papers. Which one is ICCV17 spotlight, he cited more than 300 times. First proposed the use of an auxiliary image generated GAN identification feature learning pedestrian weight. A TOMM Web of Science journal articles were selected as 2018 High-cited papers have been cited more than 200 times. At the same time, he also contributed to the community to identify the problem of heavy pedestrian benchmark code on Github star more than 1000, is widely used.

In addition, other authors of the paper include experts in the field of video NVIDIA Research Institute - Yang Xiaodong, an expert Yu Ding areas of the face (Sphere Face, LargeMargin author), heavy pedestrian recognition expert Dr. Zheng Liang, Zheng Zhedong tutor Professor Yang Yi (this year there are three in CVPR oral draft), and NVIDIA Institute VP Jan Kautz like.

Zhengzhe Dong Website: http://zdzheng.xyz/

Guess you like

Origin blog.51cto.com/14459071/2421940