Li Hongyi Machine Learning Homework 6 - Using GAN to Generate Anime Character Faces

Theoretical part reference: ​Li Hongyi Machine Learning - Against Generative Network (GAN)_iwill323's Blog-CSDN Blog

Table of contents

Tasks and Datasets

Evaluation method

FID

AFD (Anime face detection) rate

the code

Guide package

Create a dataset

show some pictures

model settings

Builder

discriminator

weight initialization

train

process

loss function

binary classification

discriminator

generator

WRONG

training function

train

read data

Set config

infer

GAN effect

Tasks and Datasets

1. Input: random number, the input dimension is (batch size, feature number)
2. Output: anime character faces
3. Implementation requirement: DCGAN & WGAN & WGAN-GP
4. Target: generate 1000 anime character
faces

The data comes from the Crypko website with 71,314 images. Data can be obtained from Li Hongyi's 2022 Machine Learning HW6 Analysis_Machine Learning Craftsman's Blog-CSDN Blog

Evaluation method

FID

Send the true and false pictures to another model, generate corresponding features, and calculate
the distance

AFD (Anime face detection) rate

1. To detect how many anime faces in your submission
2. The higher, the better ​

the code

Guide package

# import module
import os
import glob
import random
from datetime import datetime

import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision
import torchvision.transforms as transforms
from torch import optim
from torch.utils.data import Dataset, DataLoader
from torch import autograd
from torch.autograd import Variable

import matplotlib.pyplot as plt
import numpy as np
from PIL import Image
import logging
from tqdm import tqdm

# seed setting
def same_seeds(seed):
    # Python built-in random module
    random.seed(seed)
    # Numpy
    np.random.seed(seed)
    # Torch
    torch.manual_seed(seed)
    if torch.cuda.is_available():
        torch.cuda.manual_seed(seed)
        torch.cuda.manual_seed_all(seed)
    torch.backends.cudnn.benchmark = False
    torch.backends.cudnn.deterministic = True

same_seeds(2022)
workspace_dir = '../input'

Create a dataset

Note that fnames is a list type file, which is different from the original code, here use Image.open() to read the data

# prepare for CrypkoDataset

class CrypkoDataset(Dataset):
    def __init__(self, fnames, transform):
        self.transform = transform
        self.fnames = fnames
        self.num_samples = len(self.fnames)

    def __getitem__(self,idx):
        fname = self.fnames[idx]
        img = Image.open(fname)
        img = self.transform(img)
        return img

    def __len__(self):
        return self.num_samples

def get_dataset(root):
    # glob.glob返回匹配给定通配符的文件列表
    fnames = glob.glob(os.path.join(root, '*')) # list
    transform = transforms.Compose([        
        transforms.Resize((64, 64)),
        transforms.ToTensor(),
        transforms.Normalize(mean=(0.5, 0.5, 0.5), std=(0.5, 0.5, 0.5)),
    ])
    dataset = CrypkoDataset(fnames, transform)
    return dataset

show some pictures

temp_dataset = get_dataset(os.path.join(workspace_dir, 'faces'))

images = [temp_dataset[i] for i in range(4)]
grid_img = torchvision.utils.make_grid(images, nrow=4)
plt.figure(figsize=(10,10))
plt.imshow(grid_img.permute(1, 2, 0))
plt.show()

model settings

Builder

The purpose of the generator is to map the input vector z to the real data space. Here our data is an image, which means we need to convert the input vector z into a 3x64x64 RGB image. In actual operation, through a series of two-dimensional transposed convolutions, each transposed convolution is followed by a two-dimensional batch norm layer and a relu activation layer. The output of the generator is connected to the tanh function so that the output range is [−1,1]. It is worth mentioning that each transposed convolution is followed by a batch norm layer, which is a major contribution of the DCGAN paper. These network layers facilitate gradient computation during training.

Deconvolution reference here: ConvTranspose2d principle, how does the deep network perform upsampling? _The Blog of Flowers and Shadows under the Moon-CSDN Blog

# Generator

class Generator(nn.Module):
    """
    Input shape: (batch, in_dim)
    Output shape: (batch, 3, 64, 64)
    """
    def __init__(self, in_dim, feature_dim=64):
        super().__init__()
    
        #input: (batch, 100)
        self.l1 = nn.Sequential(
            nn.Linear(in_dim, feature_dim * 8 * 4 * 4, bias=False),
            nn.BatchNorm1d(feature_dim * 8 * 4 * 4),
            nn.ReLU()
        )
        self.l2 = nn.Sequential(
            self.dconv_bn_relu(feature_dim * 8, feature_dim * 4),               #(batch, feature_dim * 16, 8, 8)     
            self.dconv_bn_relu(feature_dim * 4, feature_dim * 2),               #(batch, feature_dim * 16, 16, 16)     
            self.dconv_bn_relu(feature_dim * 2, feature_dim),                   #(batch, feature_dim * 16, 32, 32)     
        )
        self.l3 = nn.Sequential(
            nn.ConvTranspose2d(feature_dim, 3, kernel_size=5, stride=2,
                               padding=2, output_padding=1, bias=False),
            nn.Tanh()   
        )
        self.apply(weights_init)
    def dconv_bn_relu(self, in_dim, out_dim):
        return nn.Sequential(
            nn.ConvTranspose2d(in_dim, out_dim, kernel_size=5, stride=2,
                               padding=2, output_padding=1, bias=False),        #double height and width
            nn.BatchNorm2d(out_dim),
            nn.ReLU(True)
        )
    def forward(self, x):
        y = self.l1(x)
        y = y.view(y.size(0), -1, 4, 4)
        y = self.l2(y)
        y = self.l3(y)
        return y

discriminator

The input of the discriminator is 3 *64 *64, and the output is the probability (score), which passes through the convolutional layer, BN layer, LeakyReLU layer in turn, and finally outputs the score through the sigmoid function

The idea of ​​WGAN is to train the discriminator as a distance function, so the discriminator does not need the final nonlinear sigmoid layer

# Discriminator
class Discriminator(nn.Module):
    """
    Input shape: (batch, 3, 64, 64)
    Output shape: (batch)
    """
    def __init__(self, model_type, in_dim, feature_dim=64):
        super(Discriminator, self).__init__()
            
        #input: (batch, 3, 64, 64)
        """
        Remove last sigmoid layer for WGAN
        """
        
        self.model_type = model_type
        
        self.l1 = nn.Sequential(
            nn.Conv2d(in_dim, feature_dim, kernel_size=4, stride=2, padding=1), #(batch, 3, 32, 32)
            nn.LeakyReLU(0.2),
            self.conv_bn_lrelu(feature_dim, feature_dim * 2),                   #(batch, 3, 16, 16)
            self.conv_bn_lrelu(feature_dim * 2, feature_dim * 4),               #(batch, 3, 8, 8)
            self.conv_bn_lrelu(feature_dim * 4, feature_dim * 8),               #(batch, 3, 4, 4)
            nn.Conv2d(feature_dim * 8, 1, kernel_size=4, stride=1, padding=0)            
        )        
        
        if self.model_type == 'GAN':
            self.l1.add_module(
                'sigmoid', nn.Sigmoid() 
            )
        
        self.apply(weights_init)
        
    def conv_bn_lrelu(self, in_dim, out_dim):
        layer = nn.Sequential(
            nn.Conv2d(in_dim, out_dim, 4, 2, 1),
            nn.BatchNorm2d(out_dim),
            nn.LeakyReLU(0.2),
        )
        
        if self.model_type == 'WGAN-GP':
            layer[1] = nn.InstanceNorm2d(out_dim)
        
        return layer
    
    def forward(self, x):
        y = self.l1(x)
        y = y.view(-1)
        return y

weight initialization

DCGAN states that all weights are randomly initialized from a normal distribution with mean 0 and standard deviation 0.2. The weights_init function reads an initialized model and reinitializes convolutional layers, transposed convolutional layers, and batch normalization layers. This function is used after model initialization.

In the initialization function of the generator and discriminator: self.apply(weights_init)

# setting for weight init function
def weights_init(m):
    classname = m.__class__.__name__
    if classname.find('Conv') != -1:
        m.weight.data.normal_(0.0, 0.02)
    elif classname.find('BatchNorm') != -1:
        m.weight.data.normal_(1.0, 0.02)
        m.bias.data.fill_(0)

train

process

  1. prepare_environment: construct the models, create directory for the log and ckpt
    1. in_dim=z_dim=100, the distribution of z (Gaussian distribution) depth is 100
    2. Because the input is a picture, 3 channels, so Discriminator(3)
    3. If there is a BN layer in the model, you need to add model.train() when training and add model.eval() when testing. Among them, model.train() is to ensure the mean and variance of each batch of data used by the BN layer, and model.eval() is to ensure the mean and variance of all training data used by BN.
    4. Choose an optimizer based on model class
  2. train: training generator and discriminator
    • When training the generator, the fake image generated by the generator must be sent to the discriminator again to obtain the discrimination result. Because the discriminator has been trained at this time, the generator has to fool the updated discriminator. This is what Mr. Li Hongyi said in class about generators and discriminators "move forward".
  3. inference: after training, you can pass the generator ckpt path into it and the function will save the result for you

loss function

binary classification

As the teacher said in class, the training process of GAN is a minmax training, but almost no one will really use the gradient ascent method, so there are differences between practice and theory. The idea of ​​GAN has a certain relationship with binary classification. Let’s first look at the loss function of the binary classification problem. I hope that the smaller the Loss function, the better.

When y=1, L(y^,y)=−log y^. If y^ is closer to 1, L(y^,y)≈0, it means that the prediction effect is better; if y^ is closer to 0, L(y^,y)≈+∞, it means that the prediction effect is worse.

When y=0, L(y^,y)=−log (1−y^). If y^ is closer to 0, L(y^,y)≈0, it means that the prediction effect is better; if y^ is closer to 1, L(y^,y)≈+∞, it means that the prediction effect is worse.

discriminator

The following is the discriminator loss function given by Mr. Li's PPT

Apply the loss function of binary classification, let y^=D(y), when the data is collected from Pdata, the label y=1, the loss function is −log y^; when the data is collected from PG, the label y=0, the loss The function is −log (1−y^). Adding the two is actually the opposite number of V(G,D). That is to say, the training discriminator can directly use the binary cross entropy loss (BCELoss), where the label of the real picture is 1, and the label of the generated picture is 0

r_label = torch.ones((bs)).to(self.device)
f_label = torch.zeros((bs)).to(self.device)
r_loss = self.loss(r_logit, r_label)
f_loss = self.loss(f_logit, f_label)
loss_D = (r_loss + f_loss) / 2

generator

The following is the generator loss function given by Mr. Li's PPT

 Throwing away the irrelevant first term in V(G,D), it becomes:

The value of D(G(z)) is located at 0-1, and the minimum value of log(1-D(G(z))) is negative infinity. The problem is that the gradient of the loss curve is larger as it moves toward negative infinity. Finally the gradient explodes. Therefore, in actual training, the training of the generator will not use the gradient descent and the minimized objective function. Use the following objective function to replace the original generator loss (this part can refer to CS231n course CS231n 2022PPT notes - generate model Generative Modeling_iwill323's blog - CSDN blog ):

Apply the loss function of binary classification, let y^=D(G(z)), let the label y=1, then the loss function is −log (y^), so you can also use binary cross entropy loss (BCELoss) directly , as long as the specified label is 1

loss_G = self.loss(f_logit, r_label)

WRONG

loss function

loss_D = -torch.mean(r_logit) + torch.mean(f_logit)

WGAN-GP refers to Li Hongyi's 2022 Machine Learning HW6 Analysis_Machine Learning Craftsman's Blog-CSDN Blog Code, but the effect has not been made, and the noise map is still generated after calculating 30 epochs.

training function

class TrainerGAN():
    def __init__(self, config, device):
        self.config = config        
        self.model_type = self.config["model_type"]
        self.device = device
        
        self.G = Generator(self.config["z_dim"])
        self.D = Discriminator(self.model_type, 3)  # 3代表输入通道数
                
        self.loss = nn.BCELoss()        
 
        if self.model_type == 'GAN' or self.model_type == 'WGAN-GP':
            self.opt_D = torch.optim.Adam(self.D.parameters(), lr=self.config["lr"], betas=(0.5, 0.999))
            self.opt_G = torch.optim.Adam(self.G.parameters(), lr=self.config["lr"], betas=(0.5, 0.999))
        elif self.model_type == 'WGAN':
            self.opt_D = torch.optim.RMSprop(self.D.parameters(), lr=self.config["lr"])
            self.opt_G = torch.optim.RMSprop(self.G.parameters(), lr=self.config["lr"])    
 
        self.dataloader = None
        self.log_dir = os.path.join(self.config["save_dir"], 'logs')
        self.ckpt_dir = os.path.join(self.config["save_dir"], 'checkpoints')
        
        FORMAT = '%(asctime)s - %(levelname)s: %(message)s'
        logging.basicConfig(level=logging.INFO, 
                            format=FORMAT,
                            datefmt='%Y-%m-%d %H:%M')
        
        self.steps = 0
        self.z_samples = torch.randn(100, self.config["z_dim"], requires_grad = True).to(self.device)  # 打印100个看看生成的效果
        
    def prepare_environment(self):
        """
        Use this funciton to prepare function
        """
        os.makedirs(self.log_dir, exist_ok=True)
        os.makedirs(self.ckpt_dir, exist_ok=True)
        
        # update dir by time
        time = datetime.now().strftime('%Y-%m-%d_%H-%M-%S')
        self.log_dir = os.path.join(self.log_dir, time+f'_{self.config["model_type"]}')
        self.ckpt_dir = os.path.join(self.ckpt_dir, time+f'_{self.config["model_type"]}')
        os.makedirs(self.log_dir)
        os.makedirs(self.ckpt_dir)
        
        # model preparation
        self.G = self.G.to(self.device)
        self.D = self.D.to(self.device)
        self.G.train()
        self.D.train()
        
    def gp(self, r_imgs, f_imgs):
        """
        Implement gradient penalty function
        """
        Tensor = torch.cuda.FloatTensor
        alpha = Tensor(np.random.random((r_imgs.size(0), 1, 1, 1)))
        interpolates = (alpha*r_imgs + (1 - alpha)*f_imgs).requires_grad_(True)
        d_interpolates = self.D(interpolates)
        fake = Variable(Tensor(r_imgs.shape[0]).fill_(1.0), requires_grad=False)
        gradients = autograd.grad(
            outputs=d_interpolates,
            inputs=interpolates,
            grad_outputs=fake,
            create_graph=True,
            retain_graph=True,
            only_inputs=True,
        )[0]
        
        gradients = gradients.view(gradients.size(0), -1)
        gradient_penalty = ((gradients.norm(1, dim=1) - 1)**2).mean()
        return gradient_penalty
        
    def train(self, dataloader):
        """
        Use this function to train generator and discriminator
        """
        self.prepare_environment()

        for e, epoch in enumerate(range(self.config["n_epoch"])):
            progress_bar = tqdm(dataloader)
            progress_bar.set_description(f"Epoch {e+1}")
            for i, data in enumerate(progress_bar):
                bs = data.size(0)  # batch size
                
                # *********************
                # *    Train D        *
                # *********************
                z = torch.randn(bs, self.config["z_dim"]).to(self.device) # z甚至可以在训练前生成固定一个,反复使用               
                f_imgs = self.G(z)
                r_imgs = data.to(self.device)                
 
                # Discriminator forwarding
                r_logit = self.D(r_imgs)  # 判断真实图像
                f_logit = self.D(f_imgs.detach())  # 判断生成的假图像  使用detach()是为了避免对G求导
                
                # SETTING DISCRIMINATOR LOSS
                if self.model_type == 'GAN':
                    r_label = torch.ones((bs)).to(self.device)
                    f_label = torch.zeros((bs)).to(self.device)
                    r_loss = self.loss(r_logit, r_label)
                    f_loss = self.loss(f_logit, f_label)
                    loss_D = (r_loss + f_loss) / 2
                elif self.model_type == 'WGAN':
                    loss_D = -torch.mean(r_logit) + torch.mean(f_logit)
                elif self.model_type == 'WGAN-GP':
                    aa = -torch.mean(r_logit) + torch.mean(f_logit)
                    bb = self.gp(r_imgs, f_imgs)
                    loss_D = aa + bb # 最后一项是gradient_penalty
 
                # Discriminator backwarding
                self.D.zero_grad()
                if self.model_type != 'WGAN-GP':
                    loss_D.backward()
                else:
                    loss_D.backward(retain_graph=True)
                self.opt_D.step()                
                
                # SETTING WEIGHT CLIP:
                if self.model_type == 'WGAN':
                    for p in self.D.parameters():
                         p.data.clamp_(-self.config["clip_value"], self.config["clip_value"])
 
                # *********************
                # *    Train G        *
                # *********************
                if self.steps % self.config["n_critic"] == 0:
                    # Generator forwarding      
                    f_logit = self.D(f_imgs)  # f_imgs没必要再生成一遍
                    if self.model_type == 'GAN':                        
                        loss_G = self.loss(f_logit, r_label)
                    elif self.model_type == 'WGAN' or self.model_type == 'WGAN-GP':
                        loss_G = -torch.mean(f_logit)                        
 
                    # Generator backwarding
                    self.G.zero_grad()
                    loss_G.backward(retain_graph=True)
                    self.opt_G.step()               
                    
                if self.steps % 10 == 0:
                    progress_bar.set_postfix(loss_G=loss_G.item(), loss_D=loss_D.item())
                    print(aa.detach(), bb.detach())
                self.steps += 1       
 
            self.G.eval()
            # G()最后一层是tanh(), 输出是-1到1,也就是说,G()的输出要变成0-1才是图像
            f_imgs_sample = (self.G(self.z_samples).data + 1) / 2.0 
            filename = os.path.join(self.log_dir, f'Epoch_{epoch+1:03d}.jpg')
            torchvision.utils.save_image(f_imgs_sample, filename, nrow=10)
            logging.info(f'Save some samples to {filename}.')
 
            # Show some images during training.
            grid_img = torchvision.utils.make_grid(f_imgs_sample.cpu(), nrow=10)
            plt.figure(figsize=(10,10))
            plt.imshow(grid_img.permute(1, 2, 0))
            plt.show()
 
            self.G.train()
 
            if (e+1) % 5 == 0 or e == 0:
                # Save the checkpoints.
                torch.save(self.G.state_dict(), os.path.join(self.ckpt_dir, f'G_{e}.pth'))
                torch.save(self.D.state_dict(), os.path.join(self.ckpt_dir, f'D_{e}.pth'))
 
        logging.info('Finish training')
 
    def inference(self, G_path, n_generate=1000, n_output=30, show=False):
        """
        1. G_path is the path for Generator ckpt
        2. You can use this function to generate final answer
        """
 
        self.G.load_state_dict(torch.load(G_path))
        self.G.to(self.devices[0])
        self.G.eval()
        z = torch.randn(n_generate, self.config["z_dim"]).to(self.devices[0])
        imgs = (self.G(z).data + 1) / 2.0
        
        os.makedirs('output', exist_ok=True)
        for i in range(n_generate):
            torchvision.utils.save_image(imgs[i], f'output/{i+1}.jpg')
        
        if show:
            row, col = n_output//10 + 1, 10
            grid_img = torchvision.utils.make_grid(imgs[:n_output].cpu(), nrow=row)
            plt.figure(figsize=(row, col))
            plt.imshow(grid_img.permute(1, 2, 0))
            plt.show()

train

read data

# create dataset by the above function
batch_size = 512
num_workers = 2
dataset = get_dataset(os.path.join(workspace_dir, 'faces'))
dataloader = DataLoader(dataset, batch_size=batch_size, shuffle=True, num_workers=num_workers, drop_last = True)
print('训练集总长度是 {:d}, batch数量是 {:.2f}'.format(len(dataset), len(dataset)/batch_size))

Set config

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f'DEVICE: {device}')

config = {
    "model_type": "WGAN",    
    "lr": 1e-4,
    "n_epoch": 60,
    "n_critic": 5,  # 训练一次generator,多训练几次discriminator,效果更好 n_critic=5意味着训练比是1:5
    "z_dim": 100,
    "workspace_dir": workspace_dir, # define in the environment setting
    "save_dir": workspace_dir,
    'clip_value': 1
}

trainer = TrainerGAN(config, device)
trainer.train(dataloader)

infer

# save the 1000 images into ./output folder
trainer.inference(f'{workspace_dir}/checkpoints/2022-03-31_15-59-17_GAN/G_0.pth') # you have to modify the path when running this line

GAN effect

The following is the picture generated by GAN, the effect is quite general. It's just running for a while, and it will be much better if you adjust it again.

In addition to the poor effect, you can find the 22nd epoch during the training, and the image will suddenly become worse. The previous one is still a normal portrait (paused in the gif below, and the upper left corner is the image with red hair), and the next epoch suddenly Going bad, according to Li Hongyi's 2022 Machine Learning HW6 Analysis_Machine Learning Craftsman's Blog-CSDN Blog , loss_G suddenly increases, and loss_D is close to 0, which shows that the follow-up training discriminator performs too well relative to the generator, which is different from GAN training In the opposite direction, the best result of GAN training is that loss_G is small and loss_D is large, that is, the discriminator cannot distinguish the result of the generator.

Another problem is that after the training, the diversity of the generated images becomes worse. The specific reason is explained by the teacher in class.

The following is the image generated by WGAN, which is relatively stable until epoch=50

Regarding the calculation speed, I found an interesting thing. Same hyperparameters:

config = {

    "model_type": "GAN",

    "batch_size": 64,

    "lr": 1e-4,

    "n_epoch": 10,

    "n_critic": 1,

    "z_dim": 100,

    "workspace_dir": workspace_dir,

}

The calculation time of the Nvidia 3090 graphics card is 428 seconds, while the 3080 graphics card is faster and only takes 327 seconds. I don’t know why

Theoretical part reference: Hongyi Li Machine Learning - Confrontation Generative Network (GAN)_iwill323's Blog-CSDN Blog Understanding the Basic Principles of GAN Network_ifreewolf99's Blog-CSDN Blog ​Li Hongyi Machine Learning - Confrontation Generative Network (GAN)_iwill323's Blog-CSDN Blog

Code reference: Generating an understanding of confrontational network GAN and DCGAN (pytorch+Li Hongyi teacher homework 6) - Mt. Fuji - Blog Park

Analysis of Li Hongyi's 2022 Machine Learning HW6 - Machine Learning Craftsman's Blog - CSDN Blog

Guess you like

Origin blog.csdn.net/iwill323/article/details/127904332