Tip: After the article is written, the table of contents can be automatically generated. How to generate it can refer to the help document on the right
Article directory
foreword
Tip: Here you can add the general content to be recorded in this article:
For example: With the continuous development of artificial intelligence, the technology of machine learning is becoming more and more important. Many people have started learning machine learning. This article introduces the basics of machine learning. content.
Tip: The following is the text of this article, and the following cases are for reference
1. Introduction to DCGAN
The full name of DCGAN is Deep Convolutional Generative Adversarial Networks, and the Chinese name is Deep Convolutional Adversarial Networks.
Paper address
CNN is a convolutional network. In DCGAN, the discriminator is actually a CNN network that inputs a picture and outputs the probability of yes or no.
So in DCGAN, what is the model of the G network?
The G network is just the opposite of the CNN. It generates a picture through the G network through the noise, because the picture gradually becomes larger through the layer, which is just the opposite of the result of the convolution operation, so we can call it deconvolution.
1.1 Features of DCGAN
In addition to the difference between the G network and CNN, DCGAN has the following differences:
1. Cancel all pooling layers. In the G network, a transposed convolutional layer is used for upsampling, and in the D network, the convolution with stride is used instead of pooling.
2. In addition to the output layer of the generator model and the input layer of the discriminator model, Batch Normalization is used on other layers of the network. Using BN can stabilize learning and help to deal with training problems caused by poor initialization.
3. ReLu is used as the activation function in the G network, and tanh is used in the last layer.
LeakyRelu is used as the activation function in the 4.D network.
2. Several important concepts
2.1 Downsampling (SubSampled)
Downsampling is actually shrinking the image, the main purpose is to make the image conform to the size of the display area and generate a thumbnail of the corresponding image. For example, the pooling layer or convolution layer in CNN is downsampling. However, the image reduction caused by the convolution process is to extract features, and the pooling downsampling is to reduce the dimension of features.
2.2 Upsampling (UpSampled)
If there is downsampling, there must be upsampling. Upsampling is actually to enlarge the image, which refers to any technology that can make the image into a higher resolution. At this time, we can also understand why it can be generated by noise in the G network. A picture is up.
It has Deconvolution and UnPooling methods. Here we only introduce deconvolution, because that is what we need to use.
2.3 Deconvolution (Deconvolution)
Deconvolution is also known as fractional step convolution and transposed convolution. In the figure below, the one on the left is convolution and the one on the right is deconvolution. The convolution process is to map a 4×4 image to a 2×2 image, and the deconvolution process is to map a 2×2 image to a 4×4 image, and the kernel size of both is 3. However, it is obvious that deconvolution can only restore the size of the image, but cannot accurately restore the pixel value of the image (at this point we think about it, in CNN, we can learn the kernel of the convolution layer, so in deconvolution Can we also learn the kernel?).
For specific transposed convolution, please refer to another blog.
3. G model
The figure below is the general framework of GCGAN. In the generator, deconvolution is used to generate images, and in the discriminator, convolution is used for discrimination.
The figure below is the DCGAN generator introduced in the Deep Convolutional Generative Adversarial Networks paper. The network receives a 100x1 noise vector denoted z, passes through a series of layers, and finally maps the noise to a 64x64x3 image.
The above process is actually to turn a 1×100 vector into a 64×64×3 picture.
1. Project and reshape: Turn 1×100 into a 4×4×1024 vector through the show operation. Here we can use the fully connected layer plus convolution method.
2.CONV: deconvolution
3. Implementation code:
Generator:
import torch
import torch.nn as nn
from torch import Tensor
from typing import List
class Generator(nn.Module):
def __init__(self,in_channel:int,out_channel:int,kernel_size:List[int]=[5,5,5,5],padding:List[int]=[2,2,2,2],stride:List[int]=[2,2,2,2]):
super(Generator,self).__init__()
self.in_channel=in_channel
self.last_out_channel=out_channel
self.kernel_size=kernel_size
self.padding=padding
self.stride=stride
self.project_layer=nn.Linear(self.in_channel,1024*4*4)
self.in_channel=1024
self.main=self._make_layers(self.kernel_size,self.padding,self.stride)
for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.normal_(m.weight, 0.0, 0.02)
if isinstance(m,nn.BatchNorm2d):
nn.init.normal_(m.weight,1.0, 0.02)
def _make_layers(self,kernel_size:List[int],padding:List[int],stride:List[int]):
layers=[]
for i in range(len(kernel_size)-1):
self.out_channel=self.in_channel//2
layer=[nn.ConvTranspose2d(self.in_channel,self.out_channel,kernel_size[i],output_padding=1,padding=padding[i],stride=stride[i],bias=False),
nn.BatchNorm2d(self.out_channel),
nn.ReLU(inplace=True)
]
layers.extend(layer)
self.in_channel=self.out_channel
self.out_channel=self.in_channel//2
layers.extend([nn.ConvTranspose2d(self.in_channel,self.last_out_channel,kernel_size[i],output_padding=1,padding=padding[i],stride=stride[i]),
nn.Tanh()
])
return nn.Sequential(*layers)
def forward(self,inputs:Tensor):
batch_size=inputs.shape[0]
proj=self.project_layer(inputs)
reshape_proj=torch.reshape(proj,(batch_size,1024,4,4))
out=self.main(reshape_proj)
return out
if __name__=="__main__":
generator=Generator(100,3)
print(generator)
Discriminator
import torch
import torch.nn as nn
from torch import Tensor
from typing import List
class Discriminator(nn.Module):
def __init__(self,in_channel:int,last_out_channel:int,stride:List[int]=[2,2,2,2],padding:List[int]=[2,2,2,2],kernel_size:List[int]=[5,5,5,5]):
super(Discriminator,self).__init__()
self.main=self._make_layer(in_channel,last_out_channel,stride,padding,kernel_size)
self.fc1=nn.Linear(4*4*512,1)
self.sigmoid=nn.Sigmoid()
for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.normal_(m.weight.data, 0.0, 0.02)
if isinstance(m, nn.BatchNorm2d):
nn.init.normal_(m.weight, 1.0, 0.02)
def _make_layer(self,in_channel,last_out_channel,stride:List[int],padding:List[int],kernel_size:List[int]):
layers=[]
for i in range(len(stride)-1):
out_channel=max(in_channel*2,64)
layer=[nn.Conv2d(in_channel,out_channel,kernel_size=kernel_size[i],padding=padding[i],stride=stride[i],bias=False),
nn.BatchNorm2d(out_channel),
nn.LeakyReLU(0.2,inplace=True),
]
in_channel=out_channel
layers.extend(layer)
layers.extend([nn.Conv2d(in_channel,last_out_channel,kernel_size=kernel_size[i],padding=padding[i],stride=stride[i]),
])
return nn.Sequential(*layers)
def forward(self,inputs:Tensor)->Tensor:
out=self.main(inputs)
out=torch.flatten(out,start_dim=1)
out=self.sigmoid(self.fc1(out))
return out
if __name__=="__main__":
fixed_noise = torch.randn(1,100)
NetD=Discriminator(3,last_out_channel=512)
print(NetD)
train script:
import torch
import torch.nn as nn
from data_set import MyDataset
from torch.utils.data import DataLoader
from torchvision import transforms
from tqdm import tqdm
from model.generater import Generator
from model.discriminator import Discriminator
def train(epochs:int=10,lr=2e-4):
real_label = 1
fake_label = 0
device="cuda:0" if torch.cuda.is_available() else "cpu"
my_transform = transforms.Compose([transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
])
dataset = MyDataset(transform=my_transform)
train_loader = DataLoader(dataset, batch_size=32, shuffle=True)
NetD = Discriminator(3, last_out_channel=512).to(device)
NetG=Generator(100,3).to(device)
loss_func=nn.BCELoss()
optimizer_D=torch.optim.Adam(NetD.parameters(),lr=lr,betas=(0.5,0.999))
optimizer_G=torch.optim.Adam(NetG.parameters(),lr=lr,betas=(0.5,0.999))
for epoch in range(epochs):
train_bar=tqdm(train_loader)
for data in train_bar:
optimizer_D.zero_grad()
b_size = data.shape[0]
label = torch.full((b_size,), real_label,dtype=torch.float32).to(device)
# 分两步训练 是 ganhacks的建议
output = NetD(data.to(device)).view(-1).to(device)
loss_D_real=loss_func(output,label)
loss_D_real.backward()
noise = torch.randn(b_size,100).to(device)
fake=NetG(noise).to(device)
label.fill_(fake_label).to(device)
output = NetD(fake.detach()).view(-1)
loss_D_fake = loss_func(output, label)
loss_D_fake.backward()
loss_D_all=loss_D_real+loss_D_fake
optimizer_D.step()
NetG.zero_grad()
label.fill_(real_label)
output = NetD(fake).view(-1)
errG = loss_func(output, label)
errG.backward()
optimizer_G.step()
train_bar.desc = "train epoch[{}/{}] loss_D:{:.3f} loss_G:{:.3f}".format(epoch + 1,
epochs,
loss_D_all,errG)
if __name__=="__main__":
train()