Li Hongyi 2022 Machine Learning HW6 Analysis

Preparation

Assignment 6 is to use GAN to generate anime character faces. You need to prepare the teaching assistant code, and you can automatically download the data set by staying connected to the Internet during the running of the code. If there is already a data set, you can close the download data part of the teaching assistant code. Follow the WeChat public account to get the code, data set and answer (there are methods at the end of the article).

submit address

This assignment is submitted on NTU. Students who are not registered may not be able to submit it. If you want to discuss the assignment, you can enter the QQ group: 156013866.

Simple Baseline (score>14.58)

Method : Run the TA code directly.

Medium Baseline (score>18.04)

Method: Modify n_epoch+n_critic . To run more epochs, set epoch=50 in config. Every time the generator is updated, the discriminator is updated twice, that is, n_critic=2 is set.

After running, it is found that the generated pictures gradually become better and stable at first , and then suddenly deteriorate after epoch 27, and no human figure can be seen. The reason is that loss_G suddenly increases after the 27th epoch, and loss_D is close to 0, which shows that the follow-up training discriminator performs too well relative to the generator, which runs counter to the training of GAN. The best result of GAN training is that loss_G is small and loss_D is large. It is the result that the discriminator cannot distinguish the generator. The 23rd epoch performed relatively well, as shown in the figure below, but there are still many flaws, especially the different sizes of eyes.

config = {
   
       ......    "n_epoch": 50,    "n_critic": 2,    ......     }

Strong Baseline (score>25.20)

Two methods are used here, WGAN weight clipping and WGAN-GP.

Method 1: WGAN weight clipping , modify discriminator+RMSprop optimizer +loss function+weight clipping+train longer . Because the idea of ​​WGAN is to train the discriminator as a distance function, the discriminator does not need the final nonlinear sigmoid layer, which needs to be removed from the original code. The second is the optimizer of WGAN weight clipping. Generally, the effect of using RMSprop is better. The third is to change the loss function in generator and discriminator to distance. The fourth is to do the weight clipping operation, just remove the comment at the corresponding position in the teaching assistant code. Fifth, you need to set the clip_value item in config. In order to contrast with the DCGAN of meidium baseline, set n_epoch=50 and n_circic=2.

 # 去掉discriminator最后一个sigmoid层,这里使用注释的方式。 # nn.Sigmoid()​​​​​​# 修改optimizer为RMSpropself.opt_D = torch.optim.RMSprop(self.D.parameters(), lr=self.config["lr"])self.opt_G = torch.optim.RMSprop(self.G.parameters(), lr=self.config["lr"])# 修改loss函数,loss_D, loss_Gloss_D = -torch.mean(r_logit) + torch.mean(f_logit)......loss_G = -torch.mean(self.D(f_imgs))# 参数更新后进行weight clipping操作,此操作需要config中添加clip_value项 for p in self.D.parameters():     p.data.clamp_(-self.config["clip_value"], self.config["clip_value"])# config 中添加clip_value项,同时n_epoch=50, n_critic=2config = {
   
       ......    "n_epoch": 50,    "n_critic": 2,    "clip_value": 1,    ......     }

After running, I found that the result is not very stable. The generated image of a certain epoch looks good, but it will not work in the next epoch, and then the next one will be fine. It is guessed that the parameter update is unstable because of weight clipping. Here is a relatively good result, the picture of the 39th epoch is slightly better than DCGAN.

Method 2: WGAN-GP , modify the discriminator + loss function + gradient penalty + train longer . The first is that the discriminator of WGAN-GP still does not need the final nonlinear sigmoid layer, and needs to replace the nn.BatchNorm2d layer with the nn.InstanceNorm2d layer. Second, you need to change the loss function in the generator and discriminator to distance, and when training the generator, add the gradent penalty item to the loss function. Third, in order to contrast with the DCGAN of the meidium baseline, set n_epoch=50, n_circic=2. Note that the optimizer of WGAN-GP is Adam, not the RMSprop used by WGAN weight clipping. ​​​​​​​​

# 去掉discriminator最后一个sigmoid层。# nn.Sigmoid()
# 使用nn.InstanceNorm2d层def conv_bn_lrelu(self, in_dim, out_dim):        return nn.Sequential(            nn.Conv2d(in_dim, out_dim, 4, 2, 1),            #nn.BatchNorm2d(out_dim),            nn.InstanceNorm2d(out_dim),            nn.LeakyReLU(0.2),        )

​​​​​​​# 写gp函数并添加进loss_D       def gp(self, r_imgs, f_imgs):        Tensor = torch.cuda.FloatTensor        alpha = Tensor(np.random.random((r_imgs.size(0), 1, 1, 1)))        interpolates = (alpha*r_imgs + (1 - alpha)*f_imgs).requires_grad_(True)        d_interpolates = self.D(interpolates)        fake = Variable(Tensor(r_imgs.shape[0]).fill_(1.0), requires_grad=False)        gradients = autograd.grad(            outputs=d_interpolates,            inputs=interpolates,            grad_outputs=fake,            create_graph=True,            retain_graph=True,            only_inputs=True,        )[0]                gradients = gradients.view(gradients.size(0), -1)        gradient_penalty = ((gradients.norm(1, dim=1) - 1)**2).mean()        return gradient_penalty
# 修改loss函数,loss_D, loss_Ggradient_penalty = self.gp(r_imgs, f_imgs)loss_D = -torch.mean(r_logit) + torch.mean(f_logit) + gradient_penalty......loss_G = -torch.mean(self.D(f_imgs))​​​​​​​# n_epoch=50, n_critic=2config = {
   
       ......    "n_epoch": 50,    "n_critic": 2,    ......     }

After running, it is found that the results gradually become better and tend to be stable, which is much better than the results of each epoch "roller coaster" of DCGAN and WGAN-WC. Here is a relatively good result, the picture generated by the 43rd epoch.

Boss Baseline (acc>29.13)

Method: use stylegan2 library + kaggle platform . For details on how to use stylelegan2, see https://github.com/lucidrains/stylegan2-pytorch. The specific training settings are: image_size=64, batch_size16, num-trian-steps=20000. At the beginning, I first tested the time required for each step, which is close to 2 seconds, and the default train steps is 150000, which is too long to run by default. After a rough calculation, I chose train steps=20000, the total training time Close to 10 hours. The recommended training time in the homework is 5 hours, which corresponds to train steps=10000, and our total number of pictures is more than 70,000, which means that 10,000 steps can only run the entire picture collection a little more than 2 times. To balance time and results, I decided to set train steps=20000, but after running the program, I found that the result of 10000 steps is actually quite good, which is almost the same as 20000 steps.

Although stylegan2 requires high computing power, the resulting pictures are really good. The interpolation picture at the beginning of the article is generated after training (see the code for the method). The final picture generated by the model is shown in the figure below, and the effect is better than that of the previous models. much better. After all, it is produced by the graphics card manufacturer Nvidia, and the GPU computing power of the researchers is full. Recently, stylegan3 has also been open-sourced, which requires higher computing power. The researchers of this model run the model on a GPU cluster. The power required is the energy generated by a nuclear power plant running for 15 minutes , about 225M watts. ​​​​​​​​

# 安装stylegan2库!pip install stylegan2_pytorch#训练模型!stylegan2_pytorch --data faces --image-size 64 --batch-size 16 --num-train-steps=20000#训练结束后,利用模型生成图片,可选择使用哪个epoch的模型来生成!stylegan2_pytorch --generate --num_generate=4 --num_image_tiles=8#生成转换动画!stylegan2_pytorch --generate-interpolation --interpolation-num-steps 100

How to get the answer to homework 6:

  1. Follow the WeChat public account " Machine Learning Craftsman

  2. Background reply keywords: 202206

Guess you like

Origin blog.csdn.net/weixin_42369818/article/details/124222241