Variational Autoencoder VAE code

One, Auto-Encoder (AE)

        The purpose of the autoencoder is to train itself, and its input and output are the same. For example, a 28*28 black and white handwritten digital picture (single channel), if expressed in matrix form, the really useful features are the places where the value is 1, and their positions in the matrix space. Whereas most of the edge parts are 0, they are redundant features for a specific task.

        If you do not use CNN for feature extraction, the common method is to amortize the matrix into a 784-dimensional vector, and then instantiate this vector into a Tensor as the input of the neural network. The purpose of AE is to compress this 784-dimensional vector into a low-dimensional vector. This low-dimensional vector needs to be able to represent the 784-dimensional input of the original input.

        For example: the handwritten digital picture is 5 pictures, and after amortized to 784 dimensions, AE is used to reduce the dimension, and a 20-dimensional vector is obtained. Assuming that the original picture obeys a 784-dimensional Gaussian distribution, after learning through AE, it becomes Becomes a Gaussian distribution that obeys 20 dimensions. Similarly, a picture with a handwritten digital picture of 6 will also be compressed into a Gaussian distribution that obeys 20 dimensions. But note that although pictures 5 and 6 both obey the Gaussian distribution with a dimension of 20, their mean vector and variance vector must be significantly different. Note: In fact, the code encoding vector distribution of AE is unknown, and it is assumed to obey the Gaussian distribution here.

        The code of AE is as follows:

import numpy as np
import matplotlib.pyplot as plt
from keras.datasets import mnist
from keras.models import Model
from keras.layers import Input, Dense


# 加载MNIST数据集
(x_train, _), (x_test, _) = mnist.load_data()

# 设置潜在特征维度
latent_size =64

# 数据预处理
x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.
x_train = x_train.reshape((len(x_train), np.prod(x_train.shape[1:])))
x_test = x_test.reshape((len(x_test), np.prod(x_test.shape[1:])))

# 定义输入层
input_img = Input(shape=(784,))
# 定义编码层
encoded = Dense(latent_size =64, activation='relu')(input_img)
# 定义解码层
decoded = Dense(784, activation='sigmoid')(encoded)
# 构建自编码器模型
autoencoder = Model(input_img, decoded)
# 编译模型
autoencoder.compile(optimizer='adam', loss='mse') #mse尽可能使每一个像素与原来的接近

# 训练自编码器
autoencoder.fit(x_train, x_train,
                epochs=50,
                batch_size=256,
                shuffle=True,
                validation_data=(x_test, x_test))

# 构建编码器模型
encoder = Model(input_img, encoded)
# 构建解码器模型
encoded_input = Input(shape=(latent_size =64,))
decoder_layer = autoencoder.layers[-1]
decoder = Model(encoded_input, decoder_layer(encoded_input))
# 对测试集进行编码和解码
encoded_imgs = encoder.predict(x_test)
decoded_imgs = decoder.predict(encoded_imgs)

# 可视化结果
n = 10  # 可视化的图片数量
plt.figure(figsize=(20, 4))
for i in range(n):
    # 原始图片
    ax = plt.subplot(2, n, i + 1)
    plt.imshow(x_test[i].reshape(28, 28))
    plt.gray()
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)

    # 重构图片
    ax = plt.subplot(2, n, i + 1 + n)
    plt.imshow(decoded_imgs[i].reshape(28, 28))
    plt.gray()
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)
plt.show()

Second, variational autoencoder VAE

        The purpose of AE is to reduce the dimensionality of the high-dimensional representation and realize the representation with a low-dimensional sample. The variational autoencoder improves the AE, and the VAE can learn the result of the fluctuation of the code encoding. In terms of the generated results, the results produced by VAE always have a slight similarity with the training data, and it will not generate a picture that is nothing like AE (because this code is outside the code range generated by the training data. ). Qualitatively speaking, AE can only produce results based on a fixed code, while VAE allows code to produce errors.

        For example: if the potential feature dimension is a 10-dimensional vector, then this vector should obey a 10-element normal Gaussian distribution, then there will be a 10-dimensional mean vector and a 10-dimensional standard deviation vector. The VAE method is: 1. For the mean vector, use the neural network to directly train the weight; 2. For the standard deviation vector, make it obey a 10-dimensional standard normal distribution as much as possible, so as to obtain the weight of the standard deviation vector. Then use the heavy parameter technique to sample, sample to get reconstructed samples, and then calculate the loss function.

        Therefore, VAE actually learns a distribution function, and this probability distribution function can only be learned through the neural network, because the neural network can fit any nonlinear function, and gradient descent is a perfect solution.

        In addition, since VAE takes into account the limitations of the distribution, his loss function has two parts: 1, the same reconstruction-based error as AE, which can be MAE, MSE, BCE (which can be regarded as the accuracy of 255-class classification tasks) etc.; 2. The loss based on distribution similarity, that is, KL divergence, maximizes the similarity between multivariate Gaussian distribution of latent features and N-ary standard normal distribution, which is somewhat similar to PINN (neural network based on physical information), and aims to reduce The solution space searched during backpropagation makes the training process converge faster. The above two parts of the loss are both a likelihood process, that is, the model will try to satisfy the training data as much as possible. Like the training data, that is, 99.999% of the pictures generated by the decoder are slightly similar to the training data.

        The difficulty of VAE is to quantitatively describe the similarity between two distributions mathematically and the differentiability when sampling from latent feature vectors, that is, the process from discrete sampling to continuous expression, and this process maintains the same effect . It can be considered that if the average body of a class is 170cm, we will take 170cm as the center when sampling, and sample with 5*N(0,1), and the obtained value will be between 165 and 175, and this sampling process It can be replaced by a normal distribution of N(170,0)+5*N(0,1).

         To sum up the idea of ​​VAE is: we need to approximate the distribution function P(x) of the sample from the observed value (that is, the sample). Unfortunately, this is difficult to achieve, but we can hand over this work to the neural network. accomplish. From the GMM Gaussian mixture model, a complex distribution can be obtained by superposition of any number of standard normals. Therefore, VAE is based on the standard normal distribution, through transformation to obtain an infinite number of normal distributions, and then superimpose the distributions obtained by these transformations to fit the distribution function of the original data. The purpose of the decoder is to approximately fit the approximate distribution P(x) of the original data.

        In the figure above, u(x) and E(x) are replaced by neural networks. The code of VAE is as follows.

# %%
import os
import tensorflow as tf
import numpy as np
import keras
from keras.layers import Dense,Input,concatenate,Lambda,add
from matplotlib import pyplot as plt
from keras.datasets import mnist
import keras.backend as K

# %%
# 加载MNIST数据集
(x_train, y_train), (x_test, y_test) = mnist.load_data()
# 数据预处理
x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.
x_train = x_train.reshape((len(x_train), np.prod(x_train.shape[1:])))
x_test = x_test.reshape((len(x_test), np.prod(x_test.shape[1:])))

# %%
print(x_train.shape, y_train.shape)
print(x_test.shape, y_test.shape)

# %%
h_dim = 20
batchsz = 512
lr = 1e-3
z_dim = 10

# %%
class VAE(keras.Model):
    # 变分自编码器
    def __init__(self):
        super(VAE, self).__init__()
        # Encoder网络
        self.fc1 = Dense(128)
        self.fc2 = Dense(z_dim)  # get mean prediction
        self.fc3 = Dense(z_dim)

        # Decoder网络
        self.fc4 = Dense(128)
        self.fc5 = Dense(784)

    def encoder(self, x):
        # 获得编码器的均值和方差
        h = tf.nn.relu(self.fc1(x))
        # 获得均值向量
        mu = self.fc2(h)
        # 获得方差的log向量
        log_var = self.fc3(h)

        return mu, log_var

    def decoder(self, z):
        # 根据隐藏变量z生成图片数据
        out = tf.nn.relu(self.fc4(z))
        out = self.fc5(out)
        # 返回图片数据,784向量
        return out

    def reparameterize(self, mu, log_var):
        # reparameterize技巧,从正态分布采样epsilon
        eps = tf.random.normal(log_var.shape)
        # 计算标准差
        std = tf.exp(log_var*0.5)
        # reparameterize技巧
        z = mu + std * eps
        return z

    def call(self, inputs, training=None):
        # 前向计算
        # 编码器[b, 784] => [b, z_dim], [b, z_dim]
        mu, log_var = self.encoder(inputs)
        # 采样reparameterization trick
        z = self.reparameterize(mu, log_var)
        # 通过解码器生成
        x_hat = self.decoder(z)
        # 返回生成样本,及其均值与方差
        return x_hat, mu, log_var
    
# 创建网络对象
model = VAE()

# %%
model.build(input_shape=(4, 784))
# 优化器
optimizer = tf.optimizers.Adam(lr)

# %%
for epoch in range(10):  # 训练100个Epoch
    for step, x in enumerate(x_train):  # 遍历训练集
        # 打平,[b, 28, 28] => [b, 784]
        x = tf.reshape(x, [-1, 784])
        # 构建梯度记录器
        with tf.GradientTape() as tape:
            # 前向计算
            x_rec_logits, mu, log_var = model(x)
            # 重建损失值计算
            rec_loss = tf.nn.sigmoid_cross_entropy_with_logits(labels=x, logits=x_rec_logits)
            rec_loss = tf.reduce_sum(rec_loss) / x.shape[0]

            kl_div = -0.5 * (log_var + 1 - mu**2 - tf.exp(log_var))
            kl_div = tf.reduce_sum(kl_div) / x.shape[0]
            # 合并误差项
            loss = rec_loss + 1. * kl_div
        # 自动求导
        grads = tape.gradient(loss, model.trainable_variables)
        # 自动更新
        optimizer.apply_gradients(zip(grads, model.trainable_variables))

        if step % 100 == 0:
            # 打印训练误差
            print(epoch, step, 'kl div:', float(kl_div), 'rec loss:', float(rec_loss))

# %%
# evaluation
# 测试生成效果,从正态分布随机采样z
z = tf.random.normal((batchsz, z_dim))
logits = model.decoder(z)  # 仅通过解码器生成图片
x_hat = tf.sigmoid(logits)  # 转换为像素范围
x_hat = tf.reshape(x_hat, [-1, 28, 28]).numpy() *255.
x_hat = x_hat.astype(np.uint8)


Three, the mathematical derivation of VAE

        1. VAE assumes that the code vector is subject to a normal distribution, and the decoder generates pictures through the code vector. For the sake of likelihood, it will try to make the generated pictures similar to those in the training data. So the function fitted by the decoder is actually the approximate distribution of the original data.

        2. How to get P(z)? The answer is to use neural networks to train directly.

        3. Derivation of maximum likelihood estimation

        4. Since the KL divergence is always equal to 0, it can be simplified to obtain its lower boundary.

        5. The lower bound can be rewritten by conditional probability.

        6. Continue to split. It can be obtained that P(z) must obey the normal distribution, otherwise it cannot be optimized. The reconstruction error must be minimized in order for the network to converge.

Reference: Brother Gengzhi, Autoencoder for Deep Learning (5) VAE Image Generation Actual Combat_vae Image Generation_Yanwu, Hang's Blog-CSDN Blog

Guess you like

Origin blog.csdn.net/weixin_44992737/article/details/131957658