Fun with the StyleGAN2 model: teach you to generate anime characters


Number of positive characters: 2840 Reading time: 6 minutes

Generative Adversarial Network (GAN) is a generative model that can generate new content. Due to its interesting applications, such as generating synthetic training data, creating art, style conversion, image-to-image translation, etc., this topic is very popular in the field of machine learning.

Posted by Fathy Rashad 

url : https://www.yanxishe.com/TextTranslation/2826


Generative confrontation network

GAN Architecture [Image by Author]

GAN is composed of 2 networks, namely generator/generator and discriminator. The generator will try to generate a fake sample and trick the discriminator into believing it is a real sample. The discriminator will try to detect the generated samples from real and fake samples. This interesting concept of confrontation was proposed by Ian Goodfellow in 2014. There are already many resources available for learning GAN, so I will not explain GAN to avoid redundancy.

I recommend reading this article by Joseph Rocca to learn about GAN.

Understand Generative Adversarial Networks (GANs)

https://medium.com/m/global-identity?redirectUrl=https%3A%2F%2Ftowardsdatascience.com%2Funderstanding-generative-adversarial-networks-gans-cd6e4651a29

Style GAN2

In 2018, NVIDIA released the StyleGAN paper "A Style-Based Architecture for GANs". This article proposes a new GAN generator structure that allows them to control the different levels of detail of the generated samples, from rough details (such as head shape) to finer details (such as eye color).

StyleGAN also incorporates the idea of ​​Progressive GAN, that is, the network is initially trained on a lower resolution (4x4), and then gradually increases in larger layers after stabilization. By doing this, the training time becomes faster and the training is more stable.

Progressive growth GAN [Source: Sarah Wolf]

StyleGAN has further improved it, adding a mapping network to encode the input vector into an intermediate latent space, w, and then there will be separate values ​​to control different levels of detail.

StyleGAN generator architecture [Image by Author]

Why add a mapping network?

One of the problems of GAN is its entangled latent code representation (input vector, z). For example, suppose we have a latent code of 2 dimensions, which respectively represent the size of the face and the size of the eyes. In this case, the size of the face is highly entangled with the size of the eyes (the bigger the eyes, the bigger the face). On the other hand, we can simplify this problem by storing the proportions of faces and eyes, which will make our model simpler, because the entanglement-free representation is easier to explain by the model.

Under the entanglement representation, the data distribution may not necessarily follow a normal distribution, and we want to sample the input vector z from it. For example, the data distribution will have such a missing corner point, which represents an area where the ratio of eyes to face becomes unrealistic.

[Source: Paper]

If we sample z from a normal distribution, our model will also try to generate missing regions, and the proportions among them are unrealistic, because without training data with this characteristic, the generator will generate poor images. Therefore, the purpose of the mapping network is to split the latent representation and distort the latent space so that it can be sampled from a normal distribution.

[Source: Paper]

In addition, there is a separate input vector w at each level, so that the generator can control the visual characteristics of different levels. The first few layers (4x4, 8x8) will control higher-level (relatively coarse) details such as head shape, posture, and hairstyle. The last few layers (512x512, 1024x1024) will control the finer level of detail, such as the color of hair and eyes.

Changes in rough details (head shape, hairstyle, posture, glasses) [Source: Paper]

Changes in subtle levels of detail (hair color) [Source: Paper]

For the complete details of the StyleGAN architecture, I suggest you read NVIDIA's official paper on its implementation. This is the description and architecture diagram of the entire system from the paper itself.

Style-based global action network file structure

Random change

StyleGAN also allows you to control the random changes of different levels of detail by giving noise on each layer. Random change is the tiny randomness in the image, which will not change our perception of the image or the identity of the image, such as different combed hair, different hair positions, etc. You can see the effect of the change in the animated image below.

Rough random changes [Source: Paper]

Fine random changes [Source: Paper]

StyleGAN has also made some other improvements, which I will not introduce one by one in these articles, such as AdaIN normalization and other regularization/regularization. You can read the official paper, this article by Jonathan Hui, or this article by Rani Horev instead of reading for further details.

Truncation technique

When there is insufficiently representative data in the training samples, the generator may not be able to learn the samples and produce poor results. In order to avoid this situation, StyleGAN uses a "truncation technique" to truncate the potential vector w in the middle to make it close to the average.

???? (psi) is the threshold, used to truncate and resample the latent vector above the threshold. Therefore, if you use a higher ???, you can get a higher diversity in the generated image, but it also has a higher chance of generating strange or broken faces. For this network, the value of ???? is between 0.5 and 0.7. According to Gwern, it seems that a good image with sufficient diversity can be obtained. Although, feel free to experiment with the threshold.

3x3 Grid Images generated with 0.3 psi (left) vs 0.7 psi (middle) vs 1.3 psi (right)

Generate anime characters

I will use Aaron Gokaslan's pre-trained Anime StyleGAN2 so that we can load the model to directly generate animated faces. So, open your Jupyter notebook or googlecolab and let's start coding.

Note: If you encounter difficulties, you can refer to my Colab notebook

Therefore, first, we should clone the styleGAN repo.

$ git clone https://github.com/NVlabs/stylegan2.git

If you are using googlecolab, you can add "!" in front of the command to run it as a command:! git clone https://github.com/NVlabs/stylegan2.git

Next, we need to download the pre-trained weights and load the model. When you use googlecolab, please make sure you are running with GPU runtime, because the model is configured to use GPU.

This code is modified from this notebook

Now, we need to generate a random vector z as input to our generator. Let's create a function to generate potential code z from a given seed.

Then, we can create a function to generate an image with the random vector z it generates.

Now, we can try to generate some images and see the results.

This function will return an array of PIL.Image. In googlecolab, images can be displayed directly by printing variables. This is the first image generated.

Image by Author

Let's display it in an image grid so that we can see multiple images at once.

Then we can display the generated image in a 3x3 grid.

Image by Author

One advantage of GAN is that it has a smooth and continuous latent space, unlike VAE (Variational Auto Encoder) with gaps. Therefore, when you acquire two points in the latent space that will generate two different faces, you can create a transition or interpolation of the two faces by taking a linear path between the two points.

Interpolation of Latent Space [Source: Joseph Rocca]

Let's implement this in code and create a function to interpolate between the two values ​​of the z vector.

Let us look at the interpolation result. You can see that the first image gradually transitions to the second image.

Image by Author

Now we have completed the interpolation. Finally, we can try to make interpolation animation in the above thumbnail. We will use moviepy library to create video or GIF files.

When you run the code, it will generate an interpolated GIF animation. You can also modify the duration, grid size, or fps using the variables at the top.

StyleGAN2 interpolation GIF generated [Image by Author]

If you succeed, congratulations! You have used StyleGAN2 to generate animated faces and learned the basics of GAN and StyleGAN architecture.

What should we do next

Now that we are done, what else can you do to improve it further? Here are some things you can do.

  • Other data sets

Obviously, StyleGAN is not only limited to animation/anime data sets, there are many pre-trained data sets that can be used, such as real faces, cats, art and painted images.

Check out this GitHub repo (https://github.com/justinpinkney/awesome-pretrained-stylegan2) for the available pretrained weights. On the other hand, you can also train StyleGAN using a dataset of your choice.

  • Conditional GAN

Currently, we cannot really control the features we want to generate, such as hair color, eye color, hairstyle and accessories. Conditional GAN ​​allows you to give a label next to the input vector z to adjust the generated image to what we want. Or, you can try to understand the latent space through regression or manually. If you want to develop in this direction, Snow Halcy repo may be able to help you, because he has already done it, and even interacted in this Jupyter notebook.

Special thanks

I want to thank gwenn Branwen for doing a lot of articles and explanations on how to use StyleGAN to generate animated faces. I strongly mentioned this in the article. I totally recommend that you visit his website, because his works are a treasure trove of knowledge. In addition, take a look at visiting the waifudoesnotex (https://www.thiswaifudoesnotexist.net/) website, it has a StyleGAN model to generate animated faces and a GPT model to generate animated plots.

LiveVideoStackCon 2020 Beijing

October 31-November 1, 2020

Click [read original text] for more detailed information

Guess you like

Origin blog.csdn.net/vn9PLgZvnPs1522s82g/article/details/108988802