Machine Learning Notes - Autoencoder autoencoder

1. What is an autoencoder?

        Autoencoders are one of the main ways to develop unsupervised learning models. But what is an autoencoder?

        In short, autoencoders operate by receiving data, compressing and encoding the data, and then reconstructing the data from the encoded representation. The model is trained until the loss is minimized and the data is reproduced as closely as possible. Through this process, the autoencoder can learn important features of the data.

        An autoencoder is a neural network composed of multiple layers. The defining aspect of an autoencoder is that the input layer contains as much information as the output layer. The reason the input and output layers have exactly the same number of units is that autoencoders are designed to replicate the input data. A copy of the data is then output after analyzing the data and reconstructing the data in an unsupervised manner.

        Data passing through an autoencoder is not just a direct mapping from input to output. An autoencoder consists of three components: the encoding (input) part of the compressed data, the component that processes the compressed data (or bottleneck), and the decoder (output) part. When data is fed into an autoencoder, it is encoded and then compressed to a smaller size. The network is then trained on encoded/compressed data, and a reconstruction of that data is output.

        The neural network learns the "essence" or most important features of the input data, which is the core value of autoencoders . After training the network, the trained model can synthesize similar data, adding or subtracting certain target features. For example, you can train an autoencoder on a noisy image, and then use the trained model to remove noise from the image.

        Applications of autoencoders include: anomaly detection, data denoising (eg, image, audio), image colorization, image inpainting, information retrieval, etc., dimensionality reduction, etc.

Second, the architecture of the autoencoder

        Autoencoders can basically be divided into three distinct components: encoder, bottleneck, and decoder.

Autoencoders can basically be divided into three distinct components: encoder, bottleneck, and decoder.

        Encoder: The encoder is a feed-forward, fully-connected neural network that compresses the input into a latent space representation and encodes the input image into a dimensionality-reduced compressed representation. The compressed image is a deformed version of the original image.

        code: This part of the network contains a simplified representation of the input decoder.

        Decoder: The decoder, like the encoder, is also a feedforward network with a structure similar to that of the encoder. The network is responsible for reconstructing the input from the code back to the original dimensions.

        First, the input is compressed by the encoder and stored in a layer called code, and then the decoder decompresses the original input from the code. The main goal of an autoencoder is to obtain the same output as the input.

        Usually the decoder architecture is the mirror image of the encoder, but it is not absolute. The only requirement is that the dimensions of the input and output must be the same.

3. Types of Autoencoders

1. Convolutional Autoencoder

        Convolutional autoencoders are general-purpose feature extractors. Convolutional autoencoders use convolutional layers instead of fully connected layers. The principle is the same as that of autoencoders, downsampling the input symbols to provide a lower-dimensional latent representation, and forcing the autoencoder to learn a compressed version of the symbols.

 2. Denoising Autoencoder

        This type of autoencoder works on partially corrupted input and is trained to restore the original undistorted image. As mentioned above, this approach is an effective way to limit the network from simply duplicating the input.

        The goal is that the network will be able to reproduce the original version of the image. By comparing the corrupted data to the original data, the network can learn which features of the data are most important and which are not important/corrupted. In other words, in order for a model to denoise a corrupted image, it must extract important features of the image data.

3. Shrink Autoencoder

        The goal of shrinking autoencoders is to reduce the sensitivity of the representation to the training input data. To achieve this, add a regularization or penalty term to the loss function that the autoencoder is trying to minimize.

        Systolic autoencoders usually exist only as a few other autoencoder nodes. The denoising autoencoder makes the reconstruction function resistant to small but finite-sized perturbations of the input, while the shrinking autoencoder makes the feature extraction function resistant to infinitesimal perturbations of the input.

4. Variational Autoencoder

        Variational Autoencoders, this type of autoencoders make assumptions about the distribution of latent variables and use stochastic gradient variational Bayesian estimators during training.

        When training, the encoder creates latent distributions for different features of the input image.

        Essentially, the model learns common features of training images and assigns them the probability of their occurrence. The images can then be reverse engineered using probability distributions to generate new images that are similar to the original training images. 

        This type of autoencoder can generate new images like a GAN. Since VAEs are more flexible and customizable than GANs in generative behavior, they are suitable for any type of artistic generation.

4. How is Autoencoder Different from PCA?

        PCA and autoencoders are two popular methods for reducing the dimensionality of feature spaces.

         PCA is fundamentally a linear transformation, but autoencoders can describe complex nonlinear processes. If we were to build a linear network (ie without using non-linear activation functions at each layer), we would observe similar dimensionality reduction as in PCA.

        PCA attempts to discover low-dimensional hyperplanes that describe the raw data, while autoencoders are able to learn nonlinear manifolds (manifolds are simply defined as continuous, disjoint surfaces).

Left: Graphically illustrating the patterns that the autoencoder can find.
Right: PCA reduces dimensionality.

         Compared to autoencoders, PCA is computationally faster and less expensive. However, due to the large number of parameters, autoencoders are prone to overfitting.

5. How is an autoencoder different from a GAN?

        1. Both are generative models. AE tries to find low-dimensional representations of data conditioned on a specific input (higher dimensionality), while GANs try to create representations that generalize enough to the distribution of real data conditioned on discriminators.

        2. Although they both fall into the category of unsupervised learning, they are different approaches to solving problems.

        GAN is a generative model - it is supposed to learn to generate new samples of the dataset .

        Variational autoencoders are generative models, but normal autoencoders just reconstruct their input and cannot generate real new samples.

https://www.quora.com/What-is-the-difference-between-Generative-Adversarial-Networks-and-Autoencodershttps://www.quora.com/What-is-the-difference-between-Generative-Adversarial-Networks-and-Autoencoders

6. Example 1: Denoising Autoencoder

1 Overview

        An example of an application of denoising autoencoders is to preprocess images to improve the accuracy of optical character recognition (OCR) algorithms. If you've applied OCR before, you know that the slightest bit of false noise (eg, printer ink smudges, poor image quality during scanning, etc.) can seriously affect OCR recognition. Using denoising autoencoders, images can be automatically preprocessed to improve quality and thus improve the accuracy of OCR recognition algorithms.

        We here deliberately add noise to the MNIST training images. The goal is to enable our autoencoder to efficiently remove noise from input images.

2. Reference code

        Create the autoencoder_for_denoising.py file and insert the following code.

# import the necessary packages
from tensorflow.keras.layers import BatchNormalization
from tensorflow.keras.layers import Conv2D
from tensorflow.keras.layers import Conv2DTranspose
from tensorflow.keras.layers import LeakyReLU
from tensorflow.keras.layers import Activation
from tensorflow.keras.layers import Flatten
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Reshape
from tensorflow.keras.layers import Input
from tensorflow.keras.models import Model
from tensorflow.keras import backend as K
import numpy as np

class ConvAutoencoder:
	@staticmethod
	def build(width, height, depth, filters=(32, 64), latentDim=16):
		# initialize the input shape to be "channels last" along with
		# the channels dimension itself
		# channels dimension itself
		inputShape = (height, width, depth)
		chanDim = -1
		# define the input to the encoder
		inputs = Input(shape=inputShape)
		x = inputs

		# loop over the number of filters
		for f in filters:
			# apply a CONV => RELU => BN operation
			x = Conv2D(f, (3, 3), strides=2, padding="same")(x)
			x = LeakyReLU(alpha=0.2)(x)
			x = BatchNormalization(axis=chanDim)(x)
		# flatten the network and then construct our latent vector
		volumeSize = K.int_shape(x)
		x = Flatten()(x)
		latent = Dense(latentDim)(x)
		# build the encoder model
		encoder = Model(inputs, latent, name="encoder")

		# start building the decoder model which will accept the
		# output of the encoder as its inputs
		latentInputs = Input(shape=(latentDim,))
		x = Dense(np.prod(volumeSize[1:]))(latentInputs)
		x = Reshape((volumeSize[1], volumeSize[2], volumeSize[3]))(x)
		# loop over our number of filters again, but this time in
		# reverse order
		for f in filters[::-1]:
			# apply a CONV_TRANSPOSE => RELU => BN operation
			x = Conv2DTranspose(f, (3, 3), strides=2, padding="same")(x)
			x = LeakyReLU(alpha=0.2)(x)
			x = BatchNormalization(axis=chanDim)(x)
		# apply a single CONV_TRANSPOSE layer used to recover the
		# original depth of the image
		x = Conv2DTranspose(depth, (3, 3), padding="same")(x)
		outputs = Activation("sigmoid")(x)
		# build the decoder model
		decoder = Model(latentInputs, outputs, name="decoder")
		# our autoencoder is the encoder + decoder
		autoencoder = Model(inputs, decoder(encoder(inputs)), name="autoencoder")
		# return a 3-tuple of the encoder, decoder, and autoencoder
		return (encoder, decoder, autoencoder)


# set the matplotlib backend so figures can be saved in the background
import matplotlib
matplotlib.use("Agg")
# import the necessary packages
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.datasets import mnist
import matplotlib.pyplot as plt
import numpy as np
import argparse
import cv2
# construct the argument parse and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-s", "--samples", type=int, default=8, help="# number of samples to visualize when decoding")
ap.add_argument("-o", "--output", type=str, default="output.png", help="path to output visualization file")
ap.add_argument("-p", "--plot", type=str, default="plot.png", help="path to output plot file")
args = vars(ap.parse_args())

# initialize the number of epochs to train for and batch size
EPOCHS = 25
BS = 32
# load the MNIST dataset
print("[INFO] loading MNIST dataset...")
((trainX, _), (testX, _)) = mnist.load_data()
# add a channel dimension to every image in the dataset, then scale
# the pixel intensities to the range [0, 1]
trainX = np.expand_dims(trainX, axis=-1)
testX = np.expand_dims(testX, axis=-1)
trainX = trainX.astype("float32") / 255.0
testX = testX.astype("float32") / 255.0

# sample noise from a random normal distribution centered at 0.5 (since
# our images lie in the range [0, 1]) and a standard deviation of 0.5
trainNoise = np.random.normal(loc=0.5, scale=0.5, size=trainX.shape)
testNoise = np.random.normal(loc=0.5, scale=0.5, size=testX.shape)
trainXNoisy = np.clip(trainX + trainNoise, 0, 1)
testXNoisy = np.clip(testX + testNoise, 0, 1)

# construct our convolutional autoencoder
print("[INFO] building autoencoder...")
(encoder, decoder, autoencoder) = ConvAutoencoder.build(28, 28, 1)
opt = Adam(lr=1e-3)
autoencoder.compile(loss="mse", optimizer=opt)
# train the convolutional autoencoder
H = autoencoder.fit(trainXNoisy, trainX, validation_data=(testXNoisy, testX), epochs=EPOCHS, batch_size=BS)
# construct a plot that plots and saves the training history
N = np.arange(0, EPOCHS)
plt.style.use("ggplot")
plt.figure()
plt.plot(N, H.history["loss"], label="train_loss")
plt.plot(N, H.history["val_loss"], label="val_loss")
plt.title("Training Loss and Accuracy")
plt.xlabel("Epoch #")
plt.ylabel("Loss/Accuracy")
plt.legend(loc="lower left")
plt.savefig(args["plot"])


# use the convolutional autoencoder to make predictions on the
# testing images, then initialize our list of output images
print("[INFO] making predictions...")
decoded = autoencoder.predict(testXNoisy)
outputs = None
# loop over our number of output samples
for i in range(0, args["samples"]):
	# grab the original image and reconstructed image
	original = (testXNoisy[i] * 255).astype("uint8")
	recon = (decoded[i] * 255).astype("uint8")
	# stack the original and reconstructed image side-by-side
	output = np.hstack([original, recon])
	# if the outputs array is empty, initialize it as the current
	# side-by-side image display
	if outputs is None:
		outputs = output
	# otherwise, vertically stack the outputs
	else:
		outputs = np.vstack([outputs, output])
# save the outputs image to disk
cv2.imwrite(args["output"], outputs)

        Then enter the following command to train.

python train_denoising_autoencoder.py --output output_denoising.png --plot plot_denoising.png

3. Training results

        After 25epoch training results.

The training process is stable with no signs of overfitting

         The corresponding denoising result plot, the left is the original MNIST digits with added noise, and the right is the output of the denoising autoencoder - you can see that the denoising autoencoder is able to restore the original signal from the image while removing the noise.

Guess you like

Origin blog.csdn.net/bashendixie5/article/details/123567960
Recommended