Variational autoencoder and formula derivation process

Variational Autoencoder (VAE) is a generative model that can learn the probability distribution of a latent variable from data and can also generate new data through this probability distribution.
The core idea of ​​VAE is to use the variational inference method to learn the probability distribution of a latent variable, so that the model can generate new samples and control the characteristics of the samples during the generation process. Below we will introduce the model structure and derivation process of VAE in detail.
Model structure of VAE: The
model structure of VAE consists of two parts: encoder and decoder. The encoder maps the input data x to the distribution q(z|x) of the latent variable z, and the decoder maps the latent variable z back to the reconstructed data. The distribution p(x'|z) of x'. During the training process, the goal of VAE is to minimize the reconstruction error and the KL divergence between the latent variable distribution q(z|x) and the prior distribution p(z).

Specifically, the goal of VAE can be expressed as the following formula:
Insert image description here
where the first term represents the reconstruction error, which is the difference between the training data x and the data generated by the decoder x', which can be calculated using measures such as cross-entropy. The second term represents the KL divergence between the latent variable distribution q(z|x) and the prior distribution p(z), which measures the distance between q(z|x) and p(z), that is, q( Whether z|x) can well represent the prior distribution p(z).
Since q(z|x) is generally a Gaussian distribution, we can map the input data x to the mean vector μ and variance vector σ through the encoder network, and then sample the latent variable z from the distribution of mean and variance. Specifically, we can express it with the following formula:
Insert image description here
where is Insert image description here
a noise vector, Insert image description here
representing element-wise multiplication. In this way, we can map the input data x to the distribution q(z|x) of the latent variable z through the encoder network.
In the decoder part, we need to map the latent variable z back to the distribution p(x'|z) of the reconstructed data x'. Generally speaking, we can use a decoder network to implement this mapping, such as a fully connected layer or a convolutional neural network. Unlike the encoder network, the decoder network takes the latent variable z as input and generates reconstructed data x' as output.
In variational autoencoders, we usually assume that the distribution of the reconstructed data x' is generated by a distribution family Q(x'|z) with some parameters, which are determined by the output of the decoder network. Assuming that our decoder network is a neural network with parameters θ, then the distribution of the reconstructed data x' can be expressed as:
Insert image description here
Next, we need to define a loss function to measure the difference between the reconstructed data x' and the original data x gap. Generally speaking, we can use loss functions such as Mean Squared Error (MSE) or Cross Entropy to measure the gap between them. In this article, we take the mean square error as an example. The loss function can be expressed as:
Insert image description here
Among them, N represents the number of samples, x_i represents the original data of the i-th sample, and x_i' represents the reconstructed data of the i-th sample. Since we wish to minimize the reconstruction error, we need to minimize the loss function L_rec.

Finally, we need to combine the encoder and decoder networks to build an end-to-end variational autoencoder model. In order to achieve this goal, we need to define a final objective function, which is the cost function of the variational autoencoder:
Insert image description here
where D_KL represents the Kullback-Leibler divergence, q_φ(z|x_i) represents the given sample x_i and network parameters φ is the calculated posterior distribution of the latent variable z, and p(z) represents the prior distribution. The cost function L(θ,φ) consists of the difference between the reconstruction error and the prior distribution of the latent variable, and we need to minimize this cost function. By deriving the cost function, we can use the backpropagation algorithm to train the variational autoencoder, update the network parameters, and minimize the cost function.
Among them, D_KL represents the Kullback-Leibler divergence, q_φ(z|x_i) represents the posterior distribution of the latent variable z calculated based on the given sample x_i and network parameter φ, and p(z) represents the prior distribution. The cost function L(θ,φ) consists of the difference between the reconstruction error and the prior distribution of the latent variable, and we need to minimize this cost function. By deriving the cost function, we can use the backpropagation algorithm to train the variational autoencoder, update the network parameters, and minimize the cost function.

Guess you like

Origin blog.csdn.net/weixin_44857463/article/details/129673964