Theoretical part reference: Li Hongyi Machine Learning - Against Generative Network (GAN)_iwill323's Blog-CSDN Blog
Table of contents
AFD (Anime face detection) rate
Tasks and Datasets
1. Input: random number, the input dimension is (batch size, feature number)
2. Output: anime character faces
3. Implementation requirement: DCGAN & WGAN & WGAN-GP
4. Target: generate 1000 anime character
faces
The data comes from the Crypko website with 71,314 images. Data can be obtained from Li Hongyi's 2022 Machine Learning HW6 Analysis_Machine Learning Craftsman's Blog-CSDN Blog
Evaluation method
FID
Send the true and false pictures to another model, generate corresponding features, and calculate
the distance
AFD (Anime face detection) rate
1. To detect how many anime faces in your submission
2. The higher, the better
the code
Guide package
# import module
import os
import glob
import random
from datetime import datetime
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision
import torchvision.transforms as transforms
from torch import optim
from torch.utils.data import Dataset, DataLoader
from torch import autograd
from torch.autograd import Variable
import matplotlib.pyplot as plt
import numpy as np
from PIL import Image
import logging
from tqdm import tqdm
# seed setting
def same_seeds(seed):
# Python built-in random module
random.seed(seed)
# Numpy
np.random.seed(seed)
# Torch
torch.manual_seed(seed)
if torch.cuda.is_available():
torch.cuda.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
torch.backends.cudnn.benchmark = False
torch.backends.cudnn.deterministic = True
same_seeds(2022)
workspace_dir = '../input'
Create a dataset
Note that fnames is a list type file, which is different from the original code, here use Image.open() to read the data
# prepare for CrypkoDataset
class CrypkoDataset(Dataset):
def __init__(self, fnames, transform):
self.transform = transform
self.fnames = fnames
self.num_samples = len(self.fnames)
def __getitem__(self,idx):
fname = self.fnames[idx]
img = Image.open(fname)
img = self.transform(img)
return img
def __len__(self):
return self.num_samples
def get_dataset(root):
# glob.glob返回匹配给定通配符的文件列表
fnames = glob.glob(os.path.join(root, '*')) # list
transform = transforms.Compose([
transforms.Resize((64, 64)),
transforms.ToTensor(),
transforms.Normalize(mean=(0.5, 0.5, 0.5), std=(0.5, 0.5, 0.5)),
])
dataset = CrypkoDataset(fnames, transform)
return dataset
show some pictures
temp_dataset = get_dataset(os.path.join(workspace_dir, 'faces'))
images = [temp_dataset[i] for i in range(4)]
grid_img = torchvision.utils.make_grid(images, nrow=4)
plt.figure(figsize=(10,10))
plt.imshow(grid_img.permute(1, 2, 0))
plt.show()
model settings
Builder
The purpose of the generator is to map the input vector z to the real data space. Here our data is an image, which means we need to convert the input vector z into a 3x64x64 RGB image. In actual operation, through a series of two-dimensional transposed convolutions, each transposed convolution is followed by a two-dimensional batch norm layer and a relu activation layer. The output of the generator is connected to the tanh function so that the output range is [−1,1]. It is worth mentioning that each transposed convolution is followed by a batch norm layer, which is a major contribution of the DCGAN paper. These network layers facilitate gradient computation during training.
Deconvolution reference here: ConvTranspose2d principle, how does the deep network perform upsampling? _The Blog of Flowers and Shadows under the Moon-CSDN Blog
# Generator
class Generator(nn.Module):
"""
Input shape: (batch, in_dim)
Output shape: (batch, 3, 64, 64)
"""
def __init__(self, in_dim, feature_dim=64):
super().__init__()
#input: (batch, 100)
self.l1 = nn.Sequential(
nn.Linear(in_dim, feature_dim * 8 * 4 * 4, bias=False),
nn.BatchNorm1d(feature_dim * 8 * 4 * 4),
nn.ReLU()
)
self.l2 = nn.Sequential(
self.dconv_bn_relu(feature_dim * 8, feature_dim * 4), #(batch, feature_dim * 16, 8, 8)
self.dconv_bn_relu(feature_dim * 4, feature_dim * 2), #(batch, feature_dim * 16, 16, 16)
self.dconv_bn_relu(feature_dim * 2, feature_dim), #(batch, feature_dim * 16, 32, 32)
)
self.l3 = nn.Sequential(
nn.ConvTranspose2d(feature_dim, 3, kernel_size=5, stride=2,
padding=2, output_padding=1, bias=False),
nn.Tanh()
)
self.apply(weights_init)
def dconv_bn_relu(self, in_dim, out_dim):
return nn.Sequential(
nn.ConvTranspose2d(in_dim, out_dim, kernel_size=5, stride=2,
padding=2, output_padding=1, bias=False), #double height and width
nn.BatchNorm2d(out_dim),
nn.ReLU(True)
)
def forward(self, x):
y = self.l1(x)
y = y.view(y.size(0), -1, 4, 4)
y = self.l2(y)
y = self.l3(y)
return y
discriminator
The input of the discriminator is 3 *64 *64, and the output is the probability (score), which passes through the convolutional layer, BN layer, LeakyReLU layer in turn, and finally outputs the score through the sigmoid function
The idea of WGAN is to train the discriminator as a distance function, so the discriminator does not need the final nonlinear sigmoid layer
# Discriminator
class Discriminator(nn.Module):
"""
Input shape: (batch, 3, 64, 64)
Output shape: (batch)
"""
def __init__(self, model_type, in_dim, feature_dim=64):
super(Discriminator, self).__init__()
#input: (batch, 3, 64, 64)
"""
Remove last sigmoid layer for WGAN
"""
self.model_type = model_type
self.l1 = nn.Sequential(
nn.Conv2d(in_dim, feature_dim, kernel_size=4, stride=2, padding=1), #(batch, 3, 32, 32)
nn.LeakyReLU(0.2),
self.conv_bn_lrelu(feature_dim, feature_dim * 2), #(batch, 3, 16, 16)
self.conv_bn_lrelu(feature_dim * 2, feature_dim * 4), #(batch, 3, 8, 8)
self.conv_bn_lrelu(feature_dim * 4, feature_dim * 8), #(batch, 3, 4, 4)
nn.Conv2d(feature_dim * 8, 1, kernel_size=4, stride=1, padding=0)
)
if self.model_type == 'GAN':
self.l1.add_module(
'sigmoid', nn.Sigmoid()
)
self.apply(weights_init)
def conv_bn_lrelu(self, in_dim, out_dim):
layer = nn.Sequential(
nn.Conv2d(in_dim, out_dim, 4, 2, 1),
nn.BatchNorm2d(out_dim),
nn.LeakyReLU(0.2),
)
if self.model_type == 'WGAN-GP':
layer[1] = nn.InstanceNorm2d(out_dim)
return layer
def forward(self, x):
y = self.l1(x)
y = y.view(-1)
return y
weight initialization
DCGAN states that all weights are randomly initialized from a normal distribution with mean 0 and standard deviation 0.2. The weights_init function reads an initialized model and reinitializes convolutional layers, transposed convolutional layers, and batch normalization layers. This function is used after model initialization.
In the initialization function of the generator and discriminator: self.apply(weights_init)
# setting for weight init function
def weights_init(m):
classname = m.__class__.__name__
if classname.find('Conv') != -1:
m.weight.data.normal_(0.0, 0.02)
elif classname.find('BatchNorm') != -1:
m.weight.data.normal_(1.0, 0.02)
m.bias.data.fill_(0)
train
process
- prepare_environment: construct the models, create directory for the log and ckpt
- in_dim=z_dim=100, the distribution of z (Gaussian distribution) depth is 100
- Because the input is a picture, 3 channels, so Discriminator(3)
- If there is a BN layer in the model, you need to add model.train() when training and add model.eval() when testing. Among them, model.train() is to ensure the mean and variance of each batch of data used by the BN layer, and model.eval() is to ensure the mean and variance of all training data used by BN.
- Choose an optimizer based on model class
- train: training generator and discriminator
- When training the generator, the fake image generated by the generator must be sent to the discriminator again to obtain the discrimination result. Because the discriminator has been trained at this time, the generator has to fool the updated discriminator. This is what Mr. Li Hongyi said in class about generators and discriminators "move forward".
- inference: after training, you can pass the generator ckpt path into it and the function will save the result for you
loss function
binary classification
As the teacher said in class, the training process of GAN is a minmax training, but almost no one will really use the gradient ascent method, so there are differences between practice and theory. The idea of GAN has a certain relationship with binary classification. Let’s first look at the loss function of the binary classification problem. I hope that the smaller the Loss function, the better.
When y=1, L(y^,y)=−log y^. If y^ is closer to 1, L(y^,y)≈0, it means that the prediction effect is better; if y^ is closer to 0, L(y^,y)≈+∞, it means that the prediction effect is worse.
When y=0, L(y^,y)=−log (1−y^). If y^ is closer to 0, L(y^,y)≈0, it means that the prediction effect is better; if y^ is closer to 1, L(y^,y)≈+∞, it means that the prediction effect is worse.
discriminator
The following is the discriminator loss function given by Mr. Li's PPT
Apply the loss function of binary classification, let y^=D(y), when the data is collected from Pdata, the label y=1, the loss function is −log y^; when the data is collected from PG, the label y=0, the loss The function is −log (1−y^). Adding the two is actually the opposite number of V(G,D). That is to say, the training discriminator can directly use the binary cross entropy loss (BCELoss), where the label of the real picture is 1, and the label of the generated picture is 0
r_label = torch.ones((bs)).to(self.device)
f_label = torch.zeros((bs)).to(self.device)
r_loss = self.loss(r_logit, r_label)
f_loss = self.loss(f_logit, f_label)
loss_D = (r_loss + f_loss) / 2
generator
The following is the generator loss function given by Mr. Li's PPT
Throwing away the irrelevant first term in V(G,D), it becomes:
The value of D(G(z)) is located at 0-1, and the minimum value of log(1-D(G(z))) is negative infinity. The problem is that the gradient of the loss curve is larger as it moves toward negative infinity. Finally the gradient explodes. Therefore, in actual training, the training of the generator will not use the gradient descent and the minimized objective function. Use the following objective function to replace the original generator loss (this part can refer to CS231n course CS231n 2022PPT notes - generate model Generative Modeling_iwill323's blog - CSDN blog ):
Apply the loss function of binary classification, let y^=D(G(z)), let the label y=1, then the loss function is −log (y^), so you can also use binary cross entropy loss (BCELoss) directly , as long as the specified label is 1
loss_G = self.loss(f_logit, r_label)
WRONG
loss function
loss_D = -torch.mean(r_logit) + torch.mean(f_logit)
WGAN-GP refers to Li Hongyi's 2022 Machine Learning HW6 Analysis_Machine Learning Craftsman's Blog-CSDN Blog Code, but the effect has not been made, and the noise map is still generated after calculating 30 epochs.
training function
class TrainerGAN():
def __init__(self, config, device):
self.config = config
self.model_type = self.config["model_type"]
self.device = device
self.G = Generator(self.config["z_dim"])
self.D = Discriminator(self.model_type, 3) # 3代表输入通道数
self.loss = nn.BCELoss()
if self.model_type == 'GAN' or self.model_type == 'WGAN-GP':
self.opt_D = torch.optim.Adam(self.D.parameters(), lr=self.config["lr"], betas=(0.5, 0.999))
self.opt_G = torch.optim.Adam(self.G.parameters(), lr=self.config["lr"], betas=(0.5, 0.999))
elif self.model_type == 'WGAN':
self.opt_D = torch.optim.RMSprop(self.D.parameters(), lr=self.config["lr"])
self.opt_G = torch.optim.RMSprop(self.G.parameters(), lr=self.config["lr"])
self.dataloader = None
self.log_dir = os.path.join(self.config["save_dir"], 'logs')
self.ckpt_dir = os.path.join(self.config["save_dir"], 'checkpoints')
FORMAT = '%(asctime)s - %(levelname)s: %(message)s'
logging.basicConfig(level=logging.INFO,
format=FORMAT,
datefmt='%Y-%m-%d %H:%M')
self.steps = 0
self.z_samples = torch.randn(100, self.config["z_dim"], requires_grad = True).to(self.device) # 打印100个看看生成的效果
def prepare_environment(self):
"""
Use this funciton to prepare function
"""
os.makedirs(self.log_dir, exist_ok=True)
os.makedirs(self.ckpt_dir, exist_ok=True)
# update dir by time
time = datetime.now().strftime('%Y-%m-%d_%H-%M-%S')
self.log_dir = os.path.join(self.log_dir, time+f'_{self.config["model_type"]}')
self.ckpt_dir = os.path.join(self.ckpt_dir, time+f'_{self.config["model_type"]}')
os.makedirs(self.log_dir)
os.makedirs(self.ckpt_dir)
# model preparation
self.G = self.G.to(self.device)
self.D = self.D.to(self.device)
self.G.train()
self.D.train()
def gp(self, r_imgs, f_imgs):
"""
Implement gradient penalty function
"""
Tensor = torch.cuda.FloatTensor
alpha = Tensor(np.random.random((r_imgs.size(0), 1, 1, 1)))
interpolates = (alpha*r_imgs + (1 - alpha)*f_imgs).requires_grad_(True)
d_interpolates = self.D(interpolates)
fake = Variable(Tensor(r_imgs.shape[0]).fill_(1.0), requires_grad=False)
gradients = autograd.grad(
outputs=d_interpolates,
inputs=interpolates,
grad_outputs=fake,
create_graph=True,
retain_graph=True,
only_inputs=True,
)[0]
gradients = gradients.view(gradients.size(0), -1)
gradient_penalty = ((gradients.norm(1, dim=1) - 1)**2).mean()
return gradient_penalty
def train(self, dataloader):
"""
Use this function to train generator and discriminator
"""
self.prepare_environment()
for e, epoch in enumerate(range(self.config["n_epoch"])):
progress_bar = tqdm(dataloader)
progress_bar.set_description(f"Epoch {e+1}")
for i, data in enumerate(progress_bar):
bs = data.size(0) # batch size
# *********************
# * Train D *
# *********************
z = torch.randn(bs, self.config["z_dim"]).to(self.device) # z甚至可以在训练前生成固定一个,反复使用
f_imgs = self.G(z)
r_imgs = data.to(self.device)
# Discriminator forwarding
r_logit = self.D(r_imgs) # 判断真实图像
f_logit = self.D(f_imgs.detach()) # 判断生成的假图像 使用detach()是为了避免对G求导
# SETTING DISCRIMINATOR LOSS
if self.model_type == 'GAN':
r_label = torch.ones((bs)).to(self.device)
f_label = torch.zeros((bs)).to(self.device)
r_loss = self.loss(r_logit, r_label)
f_loss = self.loss(f_logit, f_label)
loss_D = (r_loss + f_loss) / 2
elif self.model_type == 'WGAN':
loss_D = -torch.mean(r_logit) + torch.mean(f_logit)
elif self.model_type == 'WGAN-GP':
aa = -torch.mean(r_logit) + torch.mean(f_logit)
bb = self.gp(r_imgs, f_imgs)
loss_D = aa + bb # 最后一项是gradient_penalty
# Discriminator backwarding
self.D.zero_grad()
if self.model_type != 'WGAN-GP':
loss_D.backward()
else:
loss_D.backward(retain_graph=True)
self.opt_D.step()
# SETTING WEIGHT CLIP:
if self.model_type == 'WGAN':
for p in self.D.parameters():
p.data.clamp_(-self.config["clip_value"], self.config["clip_value"])
# *********************
# * Train G *
# *********************
if self.steps % self.config["n_critic"] == 0:
# Generator forwarding
f_logit = self.D(f_imgs) # f_imgs没必要再生成一遍
if self.model_type == 'GAN':
loss_G = self.loss(f_logit, r_label)
elif self.model_type == 'WGAN' or self.model_type == 'WGAN-GP':
loss_G = -torch.mean(f_logit)
# Generator backwarding
self.G.zero_grad()
loss_G.backward(retain_graph=True)
self.opt_G.step()
if self.steps % 10 == 0:
progress_bar.set_postfix(loss_G=loss_G.item(), loss_D=loss_D.item())
print(aa.detach(), bb.detach())
self.steps += 1
self.G.eval()
# G()最后一层是tanh(), 输出是-1到1,也就是说,G()的输出要变成0-1才是图像
f_imgs_sample = (self.G(self.z_samples).data + 1) / 2.0
filename = os.path.join(self.log_dir, f'Epoch_{epoch+1:03d}.jpg')
torchvision.utils.save_image(f_imgs_sample, filename, nrow=10)
logging.info(f'Save some samples to {filename}.')
# Show some images during training.
grid_img = torchvision.utils.make_grid(f_imgs_sample.cpu(), nrow=10)
plt.figure(figsize=(10,10))
plt.imshow(grid_img.permute(1, 2, 0))
plt.show()
self.G.train()
if (e+1) % 5 == 0 or e == 0:
# Save the checkpoints.
torch.save(self.G.state_dict(), os.path.join(self.ckpt_dir, f'G_{e}.pth'))
torch.save(self.D.state_dict(), os.path.join(self.ckpt_dir, f'D_{e}.pth'))
logging.info('Finish training')
def inference(self, G_path, n_generate=1000, n_output=30, show=False):
"""
1. G_path is the path for Generator ckpt
2. You can use this function to generate final answer
"""
self.G.load_state_dict(torch.load(G_path))
self.G.to(self.devices[0])
self.G.eval()
z = torch.randn(n_generate, self.config["z_dim"]).to(self.devices[0])
imgs = (self.G(z).data + 1) / 2.0
os.makedirs('output', exist_ok=True)
for i in range(n_generate):
torchvision.utils.save_image(imgs[i], f'output/{i+1}.jpg')
if show:
row, col = n_output//10 + 1, 10
grid_img = torchvision.utils.make_grid(imgs[:n_output].cpu(), nrow=row)
plt.figure(figsize=(row, col))
plt.imshow(grid_img.permute(1, 2, 0))
plt.show()
train
read data
# create dataset by the above function
batch_size = 512
num_workers = 2
dataset = get_dataset(os.path.join(workspace_dir, 'faces'))
dataloader = DataLoader(dataset, batch_size=batch_size, shuffle=True, num_workers=num_workers, drop_last = True)
print('训练集总长度是 {:d}, batch数量是 {:.2f}'.format(len(dataset), len(dataset)/batch_size))
Set config
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f'DEVICE: {device}')
config = {
"model_type": "WGAN",
"lr": 1e-4,
"n_epoch": 60,
"n_critic": 5, # 训练一次generator,多训练几次discriminator,效果更好 n_critic=5意味着训练比是1:5
"z_dim": 100,
"workspace_dir": workspace_dir, # define in the environment setting
"save_dir": workspace_dir,
'clip_value': 1
}
trainer = TrainerGAN(config, device)
trainer.train(dataloader)
infer
# save the 1000 images into ./output folder
trainer.inference(f'{workspace_dir}/checkpoints/2022-03-31_15-59-17_GAN/G_0.pth') # you have to modify the path when running this line
GAN effect
The following is the picture generated by GAN, the effect is quite general. It's just running for a while, and it will be much better if you adjust it again.
In addition to the poor effect, you can find the 22nd epoch during the training, and the image will suddenly become worse. The previous one is still a normal portrait (paused in the gif below, and the upper left corner is the image with red hair), and the next epoch suddenly Going bad, according to Li Hongyi's 2022 Machine Learning HW6 Analysis_Machine Learning Craftsman's Blog-CSDN Blog , loss_G suddenly increases, and loss_D is close to 0, which shows that the follow-up training discriminator performs too well relative to the generator, which is different from GAN training In the opposite direction, the best result of GAN training is that loss_G is small and loss_D is large, that is, the discriminator cannot distinguish the result of the generator.
Another problem is that after the training, the diversity of the generated images becomes worse. The specific reason is explained by the teacher in class.
The following is the image generated by WGAN, which is relatively stable until epoch=50
Regarding the calculation speed, I found an interesting thing. Same hyperparameters:
config = {
"model_type": "GAN",
"batch_size": 64,
"lr": 1e-4,
"n_epoch": 10,
"n_critic": 1,
"z_dim": 100,
"workspace_dir": workspace_dir,
}
The calculation time of the Nvidia 3090 graphics card is 428 seconds, while the 3080 graphics card is faster and only takes 327 seconds. I don’t know why
Theoretical part reference: Hongyi Li Machine Learning - Confrontation Generative Network (GAN)_iwill323's Blog-CSDN Blog Understanding the Basic Principles of GAN Network_ifreewolf99's Blog-CSDN Blog Li Hongyi Machine Learning - Confrontation Generative Network (GAN)_iwill323's Blog-CSDN Blog
Code reference: Generating an understanding of confrontational network GAN and DCGAN (pytorch+Li Hongyi teacher homework 6) - Mt. Fuji - Blog Park
Analysis of Li Hongyi's 2022 Machine Learning HW6 - Machine Learning Craftsman's Blog - CSDN Blog