Generative model classic algorithm - VAE&GAN (including Python source code routines)

insert image description here

generate model

1 Overview

Deep learning is an artificial intelligence technology, and its biggest feature is the ability to analyze and process complex data. In deep learning, generative models and discriminative models are two important concepts that can help us better understand how deep learning works and achieve different tasks.

The difference between the generative model and the discriminative model is that the generative model predicts and generates new data by learning the joint distribution of the input data, while the discriminative model classifies and recognizes by learning the relationship between the input data and the output label.

Specifically, generative models are mainly used to generate new data samples, such as images, speech, text, etc. The basic idea is to generate new data by learning a joint distribution of input data and then sampling from this distribution. The most commonly used generative models include autoencoders, variational autoencoders, and generative adversarial networks.

insert image description here

Autoencoders are a simple but effective generative model whose basic idea is to compress input data into a low-dimensional space and then restore it back to the original space. Variational autoencoders are an upgraded version of autoencoders that can generate data with richer randomness and diversity. The generative confrontation network is composed of two parts, the generator and the discriminator, where the generator is used to generate new data samples, and the discriminator is used to judge whether the generated data is real or not.

insert image description here

In contrast, discriminative models are mainly used for classification and recognition tasks, such as image classification, speech recognition, natural language processing, etc. The basic idea is to perform classification and recognition by learning the relationship between input data and output labels. Common discriminative models include support vector machines, random forests, convolutional neural networks, etc.

The models involved in this column before this chapter are all discriminative models

Convolutional neural network is a very successful discriminative model, which has achieved great success in the field of image and natural language processing. Its basic structure is composed of multiple convolutional layers and pooling layers, which can extract the spatial and temporal characteristics of the data during the learning process, thereby realizing the classification and identification of the data.

2. Generate a typical structure of the model - VAE&GAN

2.1 FEET

2.1.1 Introduction

VAE is a generative model, the full name is Variational Autoencoder, and the Chinese name is Variational Autoencoder. It is an improved version of autoencoders, capable of generating data with richer randomness and diversity.

VAE is to predict and generate new data by learning the potential distribution of input data. Different from traditional autoencoders, VAE introduces the concept of latent variables, compresses input data into a low-dimensional latent space, and then samples from this latent space to generate new data. This process can be seen as a random sampling process from a given distribution, so the generated samples are highly random and diverse.

2.1.2 Model processing flow

Suppose we have the following network. The input is a set of vectors whose elements are all 1, and the output target is a cat face image. After many iterations of training, ideally, as long as the input is a vector with all 1 elements, we can get this cat face image. The principle of this idea is actually based on experience, that is, to save the image parameters, and through the fitting mapping of the network, the input is mapped to the saved image parameters.

=

The significance of this is to reduce the dimensionality of the cat face image information in the high-dimensional space to a vector in the low-dimensional space, and try to use more pictures. This time we use one-hot vectors instead of all ones. If [ 1 , 0 , 0 , 0 ] [1, 0, 0, 0][1,0,0,0 ] represents a cat, use[ 0 , 1 , 0 , 0 ] [0, 1, 0, 0][0,1,0,0 ] for dog. While this is fine, we can only store up to 4 images. So, we can increase the length of the vector and the parameters of the network, then we can get more pictures. For example, if this vector is defined as four-dimensional, and four different faces are expressed using one-hot expression, then this network can express four faces. Input different data, he will output different faces.

insert image description here

The disadvantage of such an input vector is that the vector is sparse. An effective optimization idea is to use a real vector instead of a vector of either 0 or 1. This kind of real value vector can be considered as a kind of encoding of the original picture, which leads to the concept of encoding/decoding.

eg:

[ 3.3 , 4.5 , 2.1 , 9.8 ] [3.3, 4.5, 2.1, 9.8] [3.3,4.5,2.1,9.8 ] for cats,[ 3.4 , 2.1 , 6.7 , 4.2 ] [3.4, 2.1, 6.7, 4.2][3.4,2.1,6.7,4.2 ] for dogs.

This known initial vector can be our latent variable.

In the auto encoder model, an encoder can help users encode pictures into vectors. The decoder can then restore these vectors into pictures.

insert image description here

In the figure below, we describe the final face shape through six factors, and different values ​​of these factors represent different characteristics.

insert image description here

In the above modeling, the input and output constraints are strictly limited. In order to achieve the effect of the generative model (generating unknown output), then, a prior distribution (the sampling distribution of the latent variable z) and a posterior distribution can be set. Distribution (sampling distribution of input data x), the prior distribution is usually chosen as the standard normal distribution (N(0,1)), while the posterior distribution is generated according to the input data x, usually chosen as Gaussian distribution. Then, we need to minimize the two loss functions of reconstruction error and KL divergence to achieve the training goal.

insert image description here

The reconstruction error represents the error between the generated samples and the original samples, which is calculated similarly to traditional autoencoders. The KL divergence is used to measure the difference between the prior distribution and the posterior distribution to ensure that the generated samples have a certain degree of randomness and diversity.

2.2 GAN

2.2.1 Introduction

GAN is a generative model, the full name is Generative Adversarial Networks, and the Chinese name is Generative Adversarial Network. It consists of two parts, the generator and the discriminator, which can generate realistic image, audio, text and other data.

In GAN, the generator is used to generate realistic data samples, and the discriminator is used to judge whether the input data is real or fake. The generator and the discriminator are trained against each other. Specifically, the generator generates fake samples, the discriminator classifies the real and fake samples, the generator will adjust the parameters to try to generate more realistic samples, and the discriminator will also be updated to improve the judgment, and finally the generator can generate realistic samples. of samples.

2.2.2 Key Points of Generative Adversarial Networks

  • The confrontation network has a generator (Generator) and a discriminator (Discriminator)
  • The generator generates pictures from random noise. Since these pictures are all imagined by the generator, we call it Fake Image;
  • The photo Fake Image generated by the generator and the Real Image in the training set will be passed to the discriminator, and the discriminator will judge whether they are Real or Fake.

2.2.3 Training Criteria for Generating Adversarial Networks

  1. The pictures generated by the generator are realistic enough to fool the discriminator;

  2. The discriminator is "smart" enough to distinguish between real and generated images;

  3. Finally, during training, the generator and the discriminator reach a balance in "confrontation", and the training ends.

  4. Separate the generator, it can help us "generate" the desired picture.

2.2.4 Generative Adversarial Network Model Processing Flow

insert image description here

Generate the input to the adversarial network:

What we have is only the face sample data set that is actually collected, that’s all, and the key point is that we don’t even
have the class label of the face data set, that is, we don’t know who the face corresponds to.

Generate the output of the adversarial network:

By inputting a noise, the simulation obtains a face image, which can be so realistic that it is fake.

First of all, the discriminant model is the network in the right half of the figure. Intuitively, it is a simple neural network structure. The input is an
image a probability value, which is used to judge true or false (if the probability value is greater than 0.5, it is True, less than 0.5 is false), true and false is just a threshold classification of the probability defined by people.
The second is the generation model, which can also be regarded as a neural network model. The input is a set of random numbers Z, and the output is an image instead of
a value. As can be seen from the figure, there will be two datasets, one is the real dataset and the other is the fake dataset.

GAN training objectives:

  • The purpose of the discriminant network is to be able to distinguish whether an input image is from a real sample set or a fake sample set. If the input is a real sample, the network output will be close to 1, if the input is a fake sample, the network output will be close to 0, which achieves the purpose of good discrimination.
  • The purpose of the generation network: the generation network is to make samples, and its purpose is to make the ability of making samples as strong as possible, and make it impossible for the discriminant network to judge whether it is a real sample or a fake sample.

The confrontation in GAN comes from the generator and the discriminator. The former of the generator and the discriminator trains itself to generate "fake pictures" to make it more and more realistic. The discriminator trains its ability to identify fakes. The effect is getting better and better.

GAN training - Alternate training alone

insert image description here

  • Discriminative Network Training

    True sample sets and their labels (both 1), false sample sets and their labels (both 0),
    so that the discriminative network alone, the problem becomes a simple supervised two-category If there is a problem, just send it directly
    to the neural network model for training.

  • Generative Network Training

    The training of the generative network is actually the training of the concatenation of the generative-discriminant network.
    For the samples, we need to set the labels of the generated fake samples to 1, which means that these fake samples are considered to be real samples when the network is trained.
    So why? Let's think about it, is it possible to confuse the discriminator in this way, and also make the generated fake samples gradually approach to true samples.
    Now for the training of the generated network, we have a sample set (only a fake sample set, no real sample set), and a corresponding label (all 1).

    Note that when training this concatenated network, a very important operation is not to update the parameters of the discriminant network, but to pass the error all the way, and then update the parameters of the generated network after passing it to the generated network.

After completing the generation network training, we can generate new fake samples for the previous noise Z based on the current new generation network.
And the fake samples after training should be more real. In this way, we have a new set of true and false samples, so that the above process can be repeated. We refer to this process as individual alternation training.

3. Application of generative model and discriminative model in AIGC

3.1 Application of generative model in AIGC

3.1.1 Image generation

In AIGC, image generation is an important task. Through image generation techniques, we can generate realistic image samples for applications such as data enhancement and image inpainting. In this task, commonly used generative models include GAN, VAE, etc.

Taking GAN as an example, it consists of two parts: generator and discriminator. The generator is used to generate realistic image samples, while the discriminator is used to judge whether the generated image is real or not. During the GAN training process, the generator and the discriminator compete against each other, and finally the generator can generate realistic image samples.

3.1.2 Natural Language Generation

Natural language generation is another important task, which is also widely used in AIGC. Through natural language generation technology, we can generate text that conforms to grammatical rules and semantic logic, which can be used in applications such as chatbots and translation robots. In this task, commonly used generative models include LSTM, Transformer, etc.

Take Transformer as an example, it is a very successful natural language generation model. It can learn the grammatical structure and semantic information in the text to generate logical text. In AIGC, Transformer is widely used in machine translation, dialogue systems and other fields.

3.2 Application of discriminant model in AIGC

3.2.1 Image Classification

Image classification is one of the most common tasks in the field of deep learning, and it has also been widely used in AIGC. Through image classification technology, we can divide the input images into different categories, so as to realize automatic image classification and labeling. In this task, commonly used discriminative models include convolutional neural network (CNN) and so on.

Take CNN as an example, it is a very successful image classification model. It uses convolutional and pooling layers to extract features from images, and then goes through fully connected layers for classification. In AIGC, CNN is widely used in image classification, target detection and other fields.

3.2.2 Natural Language Classification

Natural language classification is another important task, which is also widely used in AIGC. Through natural language classification technology, we can divide the input text into different categories, so as to realize automatic text classification and labeling. In this task, commonly used discriminative models include convolutional neural network, recurrent neural network (RNN), etc.

Take RNN as an example, it is a discriminative model that can handle sequence data. It can learn the sequence information and context information in the text, so as to realize natural language classification and recognition. In AIGC, RNN is widely used in sentiment analysis, text classification and other fields.

To sum up, both generative models and discriminative models have been widely used in AIGC, involving many fields such as images and natural language. By deeply understanding the principles and application scenarios of these technologies, we can better apply them to solve practical problems.

4. Code engineering practice - GAN generates handwritten digits

from __future__ import print_function, division

from keras.datasets import mnist
from keras.layers import Input, Dense, Reshape, Flatten, Dropout
from keras.layers import BatchNormalization, Activation, ZeroPadding2D
from keras.layers.advanced_activations import LeakyReLU
from keras.layers.convolutional import UpSampling2D, Conv2D
from keras.models import Sequential, Model
from keras.optimizers import Adam

import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
import tensorflow as tf

import matplotlib.pyplot as plt

import sys

import numpy as np

class GAN():
    def __init__(self):
        self.img_rows = 28
        self.img_cols = 28
        self.channels = 1
        self.img_shape = (self.img_rows, self.img_cols, self.channels)
        self.latent_dim = 100

        optimizer = Adam(0.0002, 0.5)

        # Build and compile the discriminator
        self.discriminator = self.build_discriminator()
        self.discriminator.compile(loss='binary_crossentropy',
            optimizer=optimizer,
            metrics=['accuracy'])

        # Build the generator
        self.generator = self.build_generator()

        # The generator takes noise as input and generates imgs
        z = Input(shape=(self.latent_dim,))
        img = self.generator(z)

        # For the combined model we will only train the generator
        self.discriminator.trainable = False

        # The discriminator takes generated images as input and determines validity
        validity = self.discriminator(img)

        # The combined model  (stacked generator and discriminator)
        # Trains the generator to fool the discriminator
        self.combined = Model(z, validity)
        self.combined.compile(loss='binary_crossentropy', optimizer=optimizer)


    def build_generator(self):

        model = Sequential()

        model.add(Dense(256, input_dim=self.latent_dim))
        model.add(LeakyReLU(alpha=0.2))
        model.add(BatchNormalization(momentum=0.8))
        model.add(Dense(512))
        model.add(LeakyReLU(alpha=0.2))
        model.add(BatchNormalization(momentum=0.8))
        model.add(Dense(1024))
        model.add(LeakyReLU(alpha=0.2))
        model.add(BatchNormalization(momentum=0.8))
        model.add(Dense(np.prod(self.img_shape), activation='tanh'))
        model.add(Reshape(self.img_shape))

        model.summary()

        noise = Input(shape=(self.latent_dim,))
        img = model(noise)

        return Model(noise, img)

    def build_discriminator(self):

        model = Sequential()

        model.add(Flatten(input_shape=self.img_shape))
        model.add(Dense(512))
        model.add(LeakyReLU(alpha=0.2))
        model.add(Dense(256))
        model.add(LeakyReLU(alpha=0.2))
        model.add(Dense(1, activation='sigmoid'))
        model.summary()

        img = Input(shape=self.img_shape)
        validity = model(img)

        return Model(img, validity)

    def train(self, epochs, batch_size=128, sample_interval=50):

        # Load the dataset
        (X_train, _), (_, _) = mnist.load_data()

        # Rescale -1 to 1
        X_train = X_train / 127.5 - 1.
        X_train = np.expand_dims(X_train, axis=3)

        # Adversarial ground truths
        valid = np.ones((batch_size, 1))
        fake = np.zeros((batch_size, 1))

        for epoch in range(epochs):

            # ---------------------
            #  Train Discriminator
            # ---------------------

            # Select a random batch of images
            idx = np.random.randint(0, X_train.shape[0], batch_size)
            imgs = X_train[idx]

            noise = np.random.normal(0, 1, (batch_size, self.latent_dim))

            # Generate a batch of new images
            gen_imgs = self.generator.predict(noise)

            # Train the discriminator
            d_loss_real = self.discriminator.train_on_batch(imgs, valid)
            d_loss_fake = self.discriminator.train_on_batch(gen_imgs, fake)
            d_loss = 0.5 * np.add(d_loss_real, d_loss_fake)

            # ---------------------
            #  Train Generator
            # ---------------------

            noise = np.random.normal(0, 1, (batch_size, self.latent_dim))

            # Train the generator (to have the discriminator label samples as valid)
            g_loss = self.combined.train_on_batch(noise, valid)

            # Plot the progress
            print ("%d [D loss: %f, acc.: %.2f%%] [G loss: %f]" % (epoch, d_loss[0], 100*d_loss[1], g_loss))

            # If at save interval => save generated image samples
            if epoch % sample_interval == 0:
                self.sample_images(epoch)

    def sample_images(self, epoch):
        r, c = 5, 5
        noise = np.random.normal(0, 1, (r * c, self.latent_dim))
        gen_imgs = self.generator.predict(noise)

        # Rescale images 0 - 1
        gen_imgs = 0.5 * gen_imgs + 0.5

        fig, axs = plt.subplots(r, c)
        cnt = 0
        for i in range(r):
            for j in range(c):
                axs[i,j].imshow(gen_imgs[cnt, :,:,0], cmap='gray')
                axs[i,j].axis('off')
                cnt += 1
        fig.savefig("./images/mnist_%d.png" % epoch)
        plt.close()


if __name__ == '__main__':
    gan = GAN()
    gan.train(epochs=2000, batch_size=32, sample_interval=200)

Guess you like

Origin blog.csdn.net/qq_38853759/article/details/130471765