GAN confrontation generation network study notes (3) DCGAN principle

Tip: After the article is written, the table of contents can be automatically generated. How to generate it can refer to the help document on the right


foreword

Tip: Here you can add the general content to be recorded in this article:
For example: With the continuous development of artificial intelligence, the technology of machine learning is becoming more and more important. Many people have started learning machine learning. This article introduces the basics of machine learning. content.


Tip: The following is the text of this article, and the following cases are for reference

1. Introduction to DCGAN

The full name of DCGAN is Deep Convolutional Generative Adversarial Networks, and the Chinese name is Deep Convolutional Adversarial Networks.
Paper address

CNN is a convolutional network. In DCGAN, the discriminator is actually a CNN network that inputs a picture and outputs the probability of yes or no.
insert image description here
So in DCGAN, what is the model of the G network?
The G network is just the opposite of the CNN. It generates a picture through the G network through the noise, because the picture gradually becomes larger through the layer, which is just the opposite of the result of the convolution operation, so we can call it deconvolution.

1.1 Features of DCGAN

In addition to the difference between the G network and CNN, DCGAN has the following differences:
1. Cancel all pooling layers. In the G network, a transposed convolutional layer is used for upsampling, and in the D network, the convolution with stride is used instead of pooling.
2. In addition to the output layer of the generator model and the input layer of the discriminator model, Batch Normalization is used on other layers of the network. Using BN can stabilize learning and help to deal with training problems caused by poor initialization.
3. ReLu is used as the activation function in the G network, and tanh is used in the last layer.
LeakyRelu is used as the activation function in the 4.D network.

2. Several important concepts

2.1 Downsampling (SubSampled)

Downsampling is actually shrinking the image, the main purpose is to make the image conform to the size of the display area and generate a thumbnail of the corresponding image. For example, the pooling layer or convolution layer in CNN is downsampling. However, the image reduction caused by the convolution process is to extract features, and the pooling downsampling is to reduce the dimension of features.

2.2 Upsampling (UpSampled)

If there is downsampling, there must be upsampling. Upsampling is actually to enlarge the image, which refers to any technology that can make the image into a higher resolution. At this time, we can also understand why it can be generated by noise in the G network. A picture is up.
It has Deconvolution and UnPooling methods. Here we only introduce deconvolution, because that is what we need to use.

2.3 Deconvolution (Deconvolution)

Deconvolution is also known as fractional step convolution and transposed convolution. In the figure below, the one on the left is convolution and the one on the right is deconvolution. The convolution process is to map a 4×4 image to a 2×2 image, and the deconvolution process is to map a 2×2 image to a 4×4 image, and the kernel size of both is 3. However, it is obvious that deconvolution can only restore the size of the image, but cannot accurately restore the pixel value of the image (at this point we think about it, in CNN, we can learn the kernel of the convolution layer, so in deconvolution Can we also learn the kernel?).
insert image description here
For specific transposed convolution, please refer to another blog.

3. G model

The figure below is the general framework of GCGAN. In the generator, deconvolution is used to generate images, and in the discriminator, convolution is used for discrimination.
insert image description here
The figure below is the DCGAN generator introduced in the Deep Convolutional Generative Adversarial Networks paper. The network receives a 100x1 noise vector denoted z, passes through a series of layers, and finally maps the noise to a 64x64x3 image.
insert image description here
The above process is actually to turn a 1×100 vector into a 64×64×3 picture.

1. Project and reshape: Turn 1×100 into a 4×4×1024 vector through the show operation. Here we can use the fully connected layer plus convolution method.
2.CONV: deconvolution

3. Implementation code:

Generator:

import torch
import torch.nn as nn
from torch import Tensor
from typing import List


class Generator(nn.Module):
    def __init__(self,in_channel:int,out_channel:int,kernel_size:List[int]=[5,5,5,5],padding:List[int]=[2,2,2,2],stride:List[int]=[2,2,2,2]):
        super(Generator,self).__init__()
        self.in_channel=in_channel
        self.last_out_channel=out_channel
        self.kernel_size=kernel_size
        self.padding=padding
        self.stride=stride
        self.project_layer=nn.Linear(self.in_channel,1024*4*4)
        self.in_channel=1024
        self.main=self._make_layers(self.kernel_size,self.padding,self.stride)
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.normal_(m.weight, 0.0, 0.02)
            if isinstance(m,nn.BatchNorm2d):
                nn.init.normal_(m.weight,1.0, 0.02)


    def _make_layers(self,kernel_size:List[int],padding:List[int],stride:List[int]):
        layers=[]
        for i in range(len(kernel_size)-1):
            self.out_channel=self.in_channel//2
            layer=[nn.ConvTranspose2d(self.in_channel,self.out_channel,kernel_size[i],output_padding=1,padding=padding[i],stride=stride[i],bias=False),
                   nn.BatchNorm2d(self.out_channel),
                   nn.ReLU(inplace=True)
                   ]
            layers.extend(layer)
            self.in_channel=self.out_channel
            self.out_channel=self.in_channel//2
        layers.extend([nn.ConvTranspose2d(self.in_channel,self.last_out_channel,kernel_size[i],output_padding=1,padding=padding[i],stride=stride[i]),
                       nn.Tanh()
                       ])
        return nn.Sequential(*layers)

    def forward(self,inputs:Tensor):
        batch_size=inputs.shape[0]
        proj=self.project_layer(inputs)
        reshape_proj=torch.reshape(proj,(batch_size,1024,4,4))
        out=self.main(reshape_proj)
        return out



if __name__=="__main__":

    generator=Generator(100,3)
    print(generator)

Discriminator

import torch
import torch.nn as nn
from torch import Tensor
from typing import List

class Discriminator(nn.Module):
    def __init__(self,in_channel:int,last_out_channel:int,stride:List[int]=[2,2,2,2],padding:List[int]=[2,2,2,2],kernel_size:List[int]=[5,5,5,5]):
        super(Discriminator,self).__init__()
        self.main=self._make_layer(in_channel,last_out_channel,stride,padding,kernel_size)
        self.fc1=nn.Linear(4*4*512,1)
        self.sigmoid=nn.Sigmoid()
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.normal_(m.weight.data, 0.0, 0.02)
            if isinstance(m, nn.BatchNorm2d):
                nn.init.normal_(m.weight, 1.0, 0.02)

    def _make_layer(self,in_channel,last_out_channel,stride:List[int],padding:List[int],kernel_size:List[int]):
        layers=[]

        for i in range(len(stride)-1):
            out_channel=max(in_channel*2,64)
            layer=[nn.Conv2d(in_channel,out_channel,kernel_size=kernel_size[i],padding=padding[i],stride=stride[i],bias=False),
                   nn.BatchNorm2d(out_channel),
                   nn.LeakyReLU(0.2,inplace=True),
                   ]
            in_channel=out_channel
            layers.extend(layer)
        layers.extend([nn.Conv2d(in_channel,last_out_channel,kernel_size=kernel_size[i],padding=padding[i],stride=stride[i]),
                       ])
        return nn.Sequential(*layers)

    def forward(self,inputs:Tensor)->Tensor:
        out=self.main(inputs)
        out=torch.flatten(out,start_dim=1)
        out=self.sigmoid(self.fc1(out))
        return out


if __name__=="__main__":
    fixed_noise = torch.randn(1,100)
    NetD=Discriminator(3,last_out_channel=512)
    print(NetD)

train script:

import torch
import torch.nn as nn
from data_set import MyDataset
from torch.utils.data import DataLoader
from torchvision import transforms
from tqdm import tqdm
from model.generater import Generator
from model.discriminator import Discriminator

def train(epochs:int=10,lr=2e-4):
    real_label = 1
    fake_label = 0
    device="cuda:0" if torch.cuda.is_available() else "cpu"
    my_transform = transforms.Compose([transforms.ToTensor(),
                                       transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
                                       ])
    dataset = MyDataset(transform=my_transform)
    train_loader = DataLoader(dataset, batch_size=32, shuffle=True)
    NetD = Discriminator(3, last_out_channel=512).to(device)
    NetG=Generator(100,3).to(device)
    loss_func=nn.BCELoss()
    optimizer_D=torch.optim.Adam(NetD.parameters(),lr=lr,betas=(0.5,0.999))
    optimizer_G=torch.optim.Adam(NetG.parameters(),lr=lr,betas=(0.5,0.999))
    for epoch in range(epochs):
        train_bar=tqdm(train_loader)
        for data in train_bar:
            optimizer_D.zero_grad()
            b_size = data.shape[0]
            label = torch.full((b_size,), real_label,dtype=torch.float32).to(device)
            # 分两步训练 是 ganhacks的建议
            output = NetD(data.to(device)).view(-1).to(device)
            loss_D_real=loss_func(output,label)
            loss_D_real.backward()
            noise = torch.randn(b_size,100).to(device)
            fake=NetG(noise).to(device)
            label.fill_(fake_label).to(device)
            output = NetD(fake.detach()).view(-1)
            loss_D_fake = loss_func(output, label)
            loss_D_fake.backward()
            loss_D_all=loss_D_real+loss_D_fake
            optimizer_D.step()
            NetG.zero_grad()
            label.fill_(real_label)
            output = NetD(fake).view(-1)
            errG = loss_func(output, label)
            errG.backward()
            optimizer_G.step()
            train_bar.desc = "train epoch[{}/{}] loss_D:{:.3f}   loss_G:{:.3f}".format(epoch + 1,
                                                                     epochs,
                                                                     loss_D_all,errG)


if __name__=="__main__":
    train()

Guess you like

Origin blog.csdn.net/weixin_43869415/article/details/121817179