Try to mix and match, say goodbye to plaid shirts, this GAN will help you show the effect of real HD mix and match

This article is reproduced with authorization from "Machine Heart"
Editor/ Yuliang
640?wx_fmt=png
Link: https://arxiv.org/pdf/1908.08847.pdf

Fashion e-commerce platforms simplify clothing purchases through search and personalization. Visual fitting can further enhance the user experience. Previous research mainly focused on dressing up fashion models on existing images [5, 2] or generating low-resolution images from scratch based on a given pose and clothing color [8]. This article focuses on generating high-resolution images of fashion models wearing desired clothing and maintaining a specific posture.

In recent years, the advancement of generative adversarial networks (GAN) [1] has allowed researchers to sample real images through implicit generative modeling. One of the improvements is Style GAN [7], whose idea is to use Progressive GAN [6] to generate high-resolution images and modify them through adaptive instance normalization (AdaIN) [4]. In this article, based on the model clothing pose image data set, the author applied and modified Style GAN: First, the original Style GAN was trained on a set of fashion model images, and the results showed that the clothing color and body posture of a generated fashion model can be transferred To another; Secondly, the author modified Style GAN to adapt to the process of generating clothing and human pose. This makes it possible to quickly visualize custom clothing under different body postures and body shapes.

640?wx_fmt=png
Figure 1: Samples from the data set (red markers represent key points extracted).

Clothing dataset

The author uses a proprietary image data set with approximately 380K entries. Each entry in the data set contains a specific pose fashion model wearing a specific outfit. There are a maximum of 6 items in a set. In order to obtain the body pose, the author uses the depth pose estimator to extract 16 key points [10]. Figure 1 visualizes some samples in the data set. The red marks on the fashion models represent the key points of extraction. Both model and single product images have a resolution of 1024×768.

experiment

The flow of unconditional Style GAN is shown in Figure 2(a). The model has 18 generator layers whose input is an affine transformed copy of the style vector used for adaptive instance normalization. The discriminator is the same as the original Style GAN. The author trained this network on four NVIDIA V100 GPUs for 160 epochs and took about four weeks.

640?wx_fmt=png
Figure 2a: Flow chart of unconditional GAN

In the conditional version, the author uses the embedded network to modify Style GAN, as shown in Figure 2(b). The input of the network is six single product images (a total of 18 channels) and a 16-channel heat map, where the heat map is calculated based on 16 key points. The single product images are connected according to a fixed order to obtain semantic consistency across clothing. The sorting is shown in Figure 1.

640?wx_fmt=png
Figure 2b: Conditional GAN ​​flow chart.

If a piece of clothing does not have a specific semantic category, it will be filled with an empty gray image. The embedding network creates a 512-dimensional vector, which is connected with the latent vector to generate a style vector. The model was also trained for four weeks (115 epochs). The discriminator in the conditional model uses a separate network to calculate the embedded vector of the input item and the heat map, and then uses the method in [9] to calculate the final score.

Unconditional 

In Figure 3 below, the author shows the image generated by the unconditional model. It can be seen that both single items and human body parts are actually generated with the maximum resolution of 1024×768 pixels. During training, the generator can be regularized by switching the style vectors of certain layers. This operation realizes the transfer of information between images.

640?wx_fmt=png
Figure 3: Model image generated by unconditional Style GAN

In Figure 4 below, the author gives two examples of information migration. First, the same source style vector is propagated to the 13th to 18th layers of the generator (before the affine transformation in Figure 2), and it transfers the color of the source clothing to the target generated image, as shown in Figure 4. If you copy the source style vector to a lower layer, you can achieve pose migration.

640?wx_fmt=png
Figure 4: Migrating color clothing or body posture to each student to model .

Table 1 shows the layers of the propagation source and target style vectors to achieve the required migration effect.

640?wx_fmt=png
Table 1: The layers used to propagate the style vector.

Conditional

After training the Conditional model, you can enter a set of required items and a specific posture to visualize the dressing effect, as shown in Figure 5. Figure 5 (a) and (b) are two sets of clothing used to generate images, and Figure 5 (c) and (d) are models generated by randomly selecting four poses. It can be observed that the single product is correctly presented on the generated human body, and the posture is consistent in different clothing. Figure 5(e) shows the visualization image generated by adding the jacket from the first outfit to the second outfit. It can be seen that the texture and size of the denim jacket are correctly rendered on the fashion model. It should be noted that due to the pseudo-correlation in the data set, the face of the generated model may vary with clothing and pose.

640?wx_fmt=png
Figure 5: Two different sets of clothing (a) and (b) are used to generate the model images in (c) and (d). (E) The jacket of set #1 is added to set #2 to customize the visualization.

Taking into account the differences in gender, body size and weight, the data set contains fashion models of various body types. The relative distance between the extracted key points implicitly represents this difference. The conditional model can capture and reproduce fashion models of different body types, as shown in the fourth generated image in Figure 5. This result is very encouraging, and the method may be extended to more users through virtual wearable applications in the future.

Quantitative results

The author evaluates the quality of the generated image by calculating the Frechet initial distance (FID) score of unconditional and conditional GAN ​​[3]. It can be seen from Table 2 that unconditional GAN ​​can produce higher-quality images. This conclusion can be drawn by comparing Figure 3 and Figure 5. The conditional discriminator has the additional task of checking whether the input clothing and pose are generated correctly. This can lead to a trade-off between image quality (or "authenticity") and direct control of the ability to generate clothing and pose.

640?wx_fmt=png
Table 2: Model FID score

Recommended reading



Guess you like

Origin blog.csdn.net/qq_28168421/article/details/101043503