Deep Learning Generative Models: GAN | Autoencoders | Diffusion Models

1. Brief introduction

1. Autoencoder

The structure and principle of the automatic encoder (AE):

It consists of an encoder and a decoder, which are usually neural network models.

The input data is reduced to a code by a neural network, and then decoded by a neural network to obtain a generated data that is exactly the same as the input original data, and then the encoder is trained by comparing the two data to minimize the difference between them. parameters of the decoder.

Autoencoder features :

  1. The degree of correlation with the data is high, because the features extracted using the neural network are generally highly correlated with the original training set, which means that the autoencoder can only compress data similar to the training data.
  2. Compressing data is lossy. Because it is inevitable to lose information in the process of dimensionality reduction.

Applications of Autoencoders :

  1. Data denoising;
  2. Visual dimensionality reduction;
  3. Generate data.

2. Variational Autoencoder VAE

2.1 Introduction to VAEs

Variational Autoencoder (VAE):

In the automatic encoder, a picture needs to be input, and then the picture is encoded to obtain a hidden vector , and the hidden vector is decoded to obtain a photo corresponding to the original picture.

The variational autoencoder can construct the hidden vector by itself and generate any picture. It only needs to give it a random hidden vector of standard normal distribution, and the desired picture can be generated through the decoder without giving it an original picture. In the actual situation, a trade-off needs to be made between the accuracy rate and the hidden vector obeying the standard normal distribution. KL divergence can be used to weigh the similarity of the two distributions. The smaller the value, the closer the two probability distributions are.

Variational autoencoders are able to generate new data continuously by regularizing the latent space as follows, thus allowing smooth interpolation between different attributes and eliminating gaps that may return suboptimal outputs.

Variational autoencoders encode the latent attributes of the input probabilistically (distributionally), rather than deterministically (single-valued) like ordinary autoencoders.

2.2 VAE structure

On the basis of AE, VAE adds Gaussian noise (random sampling of normal distribution) to the encoder of the mean value, so that the decoder (the generator on the right) has noise robustness; in order to prevent the noise from disappearing, all p(Z|X) Approaching the standard normal distribution, reduce the mean value of the encoder to 0 as much as possible, and keep the variance as much as possible. In this way, when the decoder is not trained well, the whole system can reduce the noise; when the decoder gradually fits, it will increase the noise.

3. Generative confrontation network GAN

Generative Adversarial Networks (GANs) structure and principle:

GANs consist of two parts: a generative model and an adversarial model. Autoencoders are generative models in general. An adversarial model is a discriminator that judges true and false.

During training, train the discriminator first. Give both fake and real data to the discriminator to optimize the discriminant model. Then train the generator. The specific method is to fix the parameters of the discriminator and optimize the parameters of the generator through backpropagation. It is hoped that the data obtained by it will be as close to 1 as possible after passing through the discriminator. At this time, it is only necessary to adjust the loss function. up. JS Divergence is symmetric and it can be used to measure the difference between two distributions.

The essence of VAE and GAN is the mapping of probability distribution

CycleGAN

CycleGAN can be regarded as the fusion of two GANs. One GAN is composed of a generator G and a discriminator DY to realize image generation and discrimination from the X domain to the Y domain; the other GAN is composed of a generator F and a discriminator DX to realize From the image generation and discrimination from the Y domain to the X domain, the two networks form a cycle process.

insert image description here

4. Diffusion model

Diffusion models are inspired by non-equilibrium thermodynamics. They define a Markov chain of diffusion steps that gradually add random noise to the data, and then learn to inverse the diffusion process to construct the desired data samples from the noise. Unlike VAE or flow models, diffusion models are learned with a fixed procedure and the latent variables are of high dimensionality (same as the original data).

The property of Markov chain: stationarity. If a probability distribution changes over time, under the action of the Markov chain, it must tend to a certain stationary distribution (such as a Gaussian distribution). As long as the termination time is long enough, the probability distribution will approach this stationary distribution. Moreover, based on the forward process of the Markov chain, the inverse process of each epoch can be approximated as a Gaussian distribution. The Gaussian distribution is a very simple distribution with a small amount of calculation, which is the most important reason for the fast diffusion.

  • 文本生成,Li X L, Thickstun J, Gulrajani I, et al. Diffusion-LM Improves Controllable Text Generation[J]. arXiv preprint arXiv:2205.14217, 2022.
  • Few-shot conditional generation, Sinha A, Song J, Meng C, et al. D2c: Diffusion-decoding models for few-shot conditional generation[J]. Advances in Neural Information Processing Systems, 2021, 34: 12533-12548.
  • 翻译,Nachmani E, Dovrat S. Zero-Shot Translation using Diffusion Models[J]. arXiv preprint arXiv:2111.01471, 2021.
  • 对话生成,Liu S, Chen H, Ren Z, et al. Knowledge diffusion for neural dialogue generation[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2018: 1489-1498.
  • Video Generation, Ho J, Salimans T, Gritsenko A, et al. Video diffusion models[J]. arXiv preprint arXiv:2204.03458, 2022.
  • Music generation, Symbolic music generation with diffusion models[J]. arXiv preprint arXiv:2103.16091, 2021.
  • Diffusion models for handwriting generation[J]. arXiv preprint arXiv:2011.06704, 2020.
  • Discrete Contrastive Diffusion for Cross-Modal and Conditional Generation[J]. arXiv preprint arXiv:2206.07771, 2022.
  • 语音生成  Diff-tts: A denoising diffusion model for text-to-speech[J]. arXiv preprint arXiv:2104.01409, 2021.
    Grad-tts: A diffusion probabilistic model for text-to-speech[C]//International Conference on Machine Learning. PMLR, 2021: 8599-8608.
  • Few-Shot Diffusion Models
  • Retrieval-Augmented Diffusion Models Retrieval-Augmented Diffusion Models

Two, the code

Simple GAN

import numpy as np
import matplotlib
from matplotlib import pyplot as plt

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers,optimizers,losses
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.python.keras import backend as K
from tensorflow.keras.utils import plot_model
from IPython.display import Image


import cv2
import PIL
import json, os
import sys

import labelme
import labelme.utils as utils
import glob
import itertools

tf.enable_eager_execution()

class GAN():
    def __init__(self,      #定义全局变量
                 ):
        self.img_shape = (52, 52, 1)  #输入图片28x28 
        self.save_path = r'./GAN.h5'   #模型保存的位置
        self.img_path = r'./photo'     #图片保存的位置
        self.batch_size = 20           #
        self.latent_dim = 100          #keras输入是100维度的张量
        self.sample_interval=1         #生成器生成图片的周期 和epoch有关
        self.epoch=10  #100
        #建立GAN模型的方法
        self.generator_model = self.build_generator()              #生成器对象
        self.discriminator_model = self.build_discriminator()      #判别器对象
        self.model = self.bulid_model()                  #GAN模型训练
    def build_generator(self):#生成器
        input=keras.Input(shape=self.latent_dim)
        x=layers.Dense(256)(input)
        x=layers.LeakyReLU(alpha=0.2)(x)
        x=layers.BatchNormalization(momentum=0.8)(x)
        x = layers.Dense(512)(x)
        x = layers.LeakyReLU(alpha=0.2)(x)
        x = layers.BatchNormalization(momentum=0.8)(x)
        x = layers.Dense(1024)(x)
        x = layers.LeakyReLU(alpha=0.2)(x)
        x = layers.BatchNormalization(momentum=0.8)(x)
        x=layers.Dense(np.prod(self.img_shape),activation='sigmoid')(x)
        output=layers.Reshape(self.img_shape)(x)
        model=keras.Model(inputs=input,outputs=output,name='generator')
        model.summary()
        return model

    def build_discriminator(self):#判别器
        input=keras.Input(shape=self.img_shape)   #输入是图片
        x=layers.Flatten(input_shape=self.img_shape)(input)   #展开
        x=layers.Dense(512)(x)                   #全连接 
        x=layers.LeakyReLU(alpha=0.2)(x)
        x=layers.Dense(256)(x)
        x=layers.LeakyReLU(alpha=0.2)(x)
        output=layers.Dense(1,activation='sigmoid')(x)
        model=keras.Model(inputs=input,outputs=output,name='discriminator')
        model.summary()
        return model
    
    def bulid_model(self):#建立GAN模型
        self.discriminator_model.compile(loss='binary_crossentropy',
                                    optimizer=keras.optimizers.Adam(0.0001, 0.000001),
                                    metrics=['accuracy'])              #对判别器进行设置loss和优化器
 
        self.discriminator_model.trainable = False  #使判别器不训练

        inputs = keras.Input(shape=self.latent_dim)                 #
        img = self.generator_model(inputs)
        outputs = self.discriminator_model(img)
        model = keras.Model(inputs=inputs, outputs=outputs)
        model.summary() #输出计算过程
        model.compile(optimizer=keras.optimizers.Adam(0.0001, 0.000001),
                      loss='binary_crossentropy',
                      )
        return model
    
    def load_data(self):
        (train_images, train_labels), (test_images, test_labels) = (np.load(r'./data/raw/arr_0.npy')[:30000],np.load(r'./data/raw/arr_1.npy')[:30000]),(np.load(r'./data/raw/arr_0.npy')[30000:],np.load(r'./data/raw/arr_1.npy')[30000:])
        train_images = train_images /255                         #将像素值归一化
        train_images = np.expand_dims(train_images, axis=3)          #在axis=3的维度后面增加一个维度 数值是1
        print('img_number:',train_images.shape)
        return train_images
    
    def train(self):
        train_images=self.load_data()#读取数据
        #生成标签
        valid = np.ones((self.batch_size, 1))
        fake = np.zeros((self.batch_size, 1))
        step=int(train_images.shape[0]/self.batch_size)#计算步长
        print('step:',step)
        for epoch in range(self.epoch):
            train_images = (tf.random.shuffle(train_images)).numpy()#每个epoch打乱一次
            if epoch % self.sample_interval == 0:
                self.generate_sample_images(epoch)

            for i in range(step):

                idx = np.arange(i*self.batch_size,i*self.batch_size+self.batch_size,1)#生成索引
                imgs =train_images[idx]#读取索引对应的图片
                noise = np.random.normal(0, 1, (self.batch_size, 100))  # 生成标准的高斯分布噪声
                gan_imgs = self.generator_model.predict(noise)#通过噪声生成图片
                #----------------------------------------------训练判别器
                discriminator_loss_real = self.discriminator_model.train_on_batch(imgs, valid)  # 真实数据对应标签1
                discriminator_loss_fake = self.discriminator_model.train_on_batch(gan_imgs, fake)  # 生成的数据对应标签0
                discriminator_loss = 0.5 * np.add(discriminator_loss_real, discriminator_loss_fake)
                #----------------------------------------------- 训练生成器
                noise = np.random.normal(0, 1, (self.batch_size, 100))
                generator_loss = self.model.train_on_batch(noise, valid)
                if i%10==0:#每十步进行输出
                    print("epoch:%d step:%d [discriminator_loss: %f, acc: %.2f%%] [generator_loss: %f]" % (
                        epoch,i,discriminator_loss[0], 100 * discriminator_loss[1], generator_loss))

#         self.model.save(self.save_path)#存储模型
    
    def pred(self):#载入模型并生成图片
        model=keras.models.load_model(self.save_path)   #载入模型
        model.summary()                                  #判别器参数报备
        noise = np.random.normal(0, 1, (1, self.latent_dim))  #来个噪声
        generator=keras.Model(inputs=model.layers[1].input,outputs=model.layers[1].output)   #输入输出
        generator.summary()                          #生成器参数报备
        img=np.squeeze(generator.predict([noise]))   #删除所有1维度的条目  
        plt.imshow(img)
        plt.show()
        print(img.shape)
        
    def generate_sample_images(self, epoch):#生成图片

        row, col = 5, 5#行列的数字
        noise = np.random.normal(0, 1, (row * col, self.latent_dim))#生成噪声
        gan_imgs = self.generator_model.predict(noise)
        fig, axs = plt.subplots(row, col)#生成5*5的画板
        idx = 0

        for i in range(row):
            for j in range(col):
                axs[i, j].imshow(gan_imgs[idx, :, :, 0], cmap='gray')
                axs[i, j].axis('off')
                idx += 1
#         fig.savefig(self.img_path+"/%d.png" % epoch)
        plt.close()#关闭画板


if __name__ == '__main__':
    GAN = GAN()
    GAN.train()

recommended reading

The diffusion model has recently become popular in the field of image generation. How do you see its limelight starting to surpass GAN? - Know almost

References

Three major deep learning generation models: VAE, GAN and their variants_csdn_csdn__AI's blog-CSDN blog_gan generation model

Guess you like

Origin blog.csdn.net/m0_64768308/article/details/126134515