1. Description

An autoencoder is a neural network that learns to compress and reconstruct input data. It consists of an encoder that compresses data into a low-dimensional representation and a decoder that reconstructs the original data from the compressed representation. The model is trained using unsupervised learning and aims to minimize the difference between the input and the reconstructed output. Autoencoders can be used for tasks such as dimensionality reduction, data denoising, and anomaly detection. They are very effective at dealing with unlabeled data and can learn meaningful representations from large datasets.

Second, the working principle of the automatic encoder

The network is provided with original images x , and their noisy versions x~ . The network tries to reconstruct its output x ' so that it is as close as possible to the original image x . By doing this, it learns how to denoise the image.

source

As shown, the encoder model converts the input into a small dense representation. A decoder model can be viewed as a generative model capable of generating specific features.

Both encoder and decoder networks are usually trained as a single ensemble. The loss function penalizes the network for creating an output x' that differs from the original input x .

By doing so, the encoder learns to preserve as much relevant information as is required by the constraints of the latent space, and cleverly discards irrelevant parts, such as noise. The decoder learns to take the compressed latent information and reconstruct it into a complete error-free input.

3. How to implement an autoencoder

Let's implement an autoencoder to denoise handwritten digits. The input is a 28x28 grayscale scaled image, and a 128-element vector is constructed.

The encoder layer is responsible for converting the input image into a compressed representation in the latent space. It consists of a series of convolutional and fully connected layers. This compressed representation contains essential features of the input image that capture its underlying pattern and structure. ReLU is used as the activation function in the encoder layer. It applies an element-wise activation function that sets the output to zero for negative inputs and leaves positive inputs unchanged. The goal of using ReLU in the encoder layer is to introduce nonlinearity, allowing the network to learn complex representations and extract important features from the input data.

The decoder layer in the code is responsible for reconstructing the image from the compressed representation in the latent space. It mirrors the structure of the encoder layer, consisting of a series of fully connected and transposed convolutional layers. The decoder layer takes the compressed representation from the latent space and reconstructs the image by inverting the operations performed by the encoder layer. It progressively upsamples the compressed representation using transposed convolutional layers and eventually produces an output image with the same dimensions as the input image. Sigmoid and ReLU activations are used in the decoder layer. Sigmoid activation compresses input values between 0 and 1, mapping the output of each neuron to a probability-like value. The goal of using sigmoid in the decoder layer is to generate reconstructed output values in the range [0, 1]. Since the input data in this code represents a binary image, sigmoid is an appropriate activation function for reconstructing pixel values.

By using appropriate activation functions at the encoder and decoder layers, the autoencoder model can effectively learn to compress the input data into a low-dimensional latent space, and then reconstruct the original input data from the latent space. The choice of activation function depends on the specific requirements and characteristics of the problem being solved.

Binary cross-entropy is used as the loss function, and Adam is used as the optimizer to minimize the loss function. The "binary_crossentropy" loss function is commonly used in binary classification tasks and is suitable for reconstructing binary images in this case. It measures the similarity between the predicted output and the real target output. The "adam" optimizer is used to update the model's weights and biases during training. Adam (short for Adaptive Moment Estimation) is an optimization algorithm that combines the advantages of RMSprop optimizers and momentum-based optimizers. It adjusts the learning rate for each weight parameter individually and uses the first and second moments of gradient to efficiently update the parameters.

By using binary cross-entropy as the loss function and an Adam optimizer, the autoencoder model aims to minimize the reconstruction error and optimize the parameters of the model to produce an accurate reconstruction of the input data.

Part 1: Importing Libraries and Modules

import numpy as np
 import matplotlib.pyplot as plt
from tensorflow.keras.datasets import mnist
 from tensorflow.keras.layers import input， dense， reshape， flatten， Conv2D， Conv2DTranspose
 from tensorflow.keras.models import model from tensorflow.keras.optimizers   import Adam

from tensorflow.keras.callbacks import EarlyStop

In this section, the necessary libraries and modules are imported.

numpy(imported as) is a library for numerical operations.np
matplotlib.pyplot(imported as) is a library for printing.plt
mnistImport from to load the MNIST dataset.tensorflow.keras.datasets
Import various layers and models from and to .tensorflow.keras.layerstensorflow.keras.models
The optimizer is imported from .Adamtensorflow.keras.optimizers
Callbacks are imported from .EarlyStoppingtensorflow.keras.callbacks

Part 2: Load and preprocess the dataset

(x_train, _), (x_test, _) = mnist.load_data()

In this section, load the MNIST dataset and split it into train and test sets. The corresponding label is ignored and not assigned to any variable.

Part 3: Preprocessing the dataset

x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0
x_train = np.expand_dims(x_train, axis=-1)
x_test = np.expand_dims(x_test, axis=-1)

In this part, the dataset is preprocessed:

The pixel values of the images in and are normalized to the range 255 to 0 by dividing by 0.1.x_trainx_test
The dimensions of the input data are expanded to include the channel dimension. This is required for convolution operations.np.expand_dims

Part 4: Adding random noise to the training set

noise_factor = 0.5
x_train_noisy = x_train + noise_factor * np.random.normal(loc=0.0, scale=1.0, size=x_train.shape)
x_test_noisy = x_test + noise_factor * np.random.normal(loc=0.0, scale=1.0, size=x_test.shape)
x_train_noisy = np.clip(x_train_noisy, 0., 1.)
x_test_noisy = np.clip(x_test_noisy, 0., 1.)

In this part, random noise is added to the training set:

A noise figure of 0.5 was chosen to control the amount of noise.
Generated using random noise samples with mean 0 and standard deviation 1, then scaled by the noise factor.np.random.normal
Noise training and test sets are obtained by adding noise to the original data.
Pixel values are clipped to ensure they stay within the valid range of 0 to 1.

Part 5: Creating an Autoencoder Model

input_shape = (28, 28, 1)
latent_dim = 128

# Encoder
inputs = Input(shape=input_shape)
x = Conv2D(32, kernel_size=3, strides=2, activation='relu', padding='same')(inputs)
x = Conv2D(64, kernel_size=3, strides=2, activation='relu', padding='same')(x)
x = Flatten()(x)
latent_repr = Dense(latent_dim)(x)

# Decoder
x = Dense(7 * 7 * 64)(latent_repr)
x = Reshape((7, 7, 64))(x)
x = Conv2DTranspose(32, kernel_size=3, strides=2, activation='relu', padding='same')(x)
decoded = Conv2DTranspose(1, kernel_size=3, strides=2, activation='sigmoid', padding='same')(x)

# Autoencoder model
autoencoder = Model(inputs, decoded)

In this part, an autoencoder model is created using an encoder-decoder architecture:

input_shapeDefined as, represents the shape of the input image.(28, 28, 1)
latent_dimSet to 128, which determines the dimensionality of the latent space.
Encoder layer definition:
will use the specified .input_shape
Two convolutional layers with 32 and 64 filters respectively, kernel size 3x3, stride 2, activation and padding are added.'relu''same'
The output of the convolutional layer uses .Flatten()
The latent representation is obtained by passing the flattened output to a fully connected layer with neurons.Denselatent_dim
Decoder layer definition:
Add a layer with neurons to match the shape of the last feature map in the encoder.Dense7 * 7 * 64
The output will be adjusted to use .(7, 7, 64)Reshape
Added two transposed convolutional layers:
The first layer has 32 filters, kernel size 3x3, stride 2, activation and padding set to .'relu''same'
The second layer has 1 filter with kernel size 3x3, stride 2, activation and padding set to .'sigmoid''same'
Create a model by specifying input and output layers.autoencoder

Part 6: Compile the autoencoder model

autoencoder.compile(optimizer=Adam(lr=0.0002), loss='binary_crossentropy')

In this part, the autoencoder model is compiled:

The optimizer has a learning rate of 0.0002.Adam
The loss function is set to .'binary_crossentropy'

Part 7: Add early stopping

early_stopping = EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True)

In this part, an early stopping callback is created:

It monitors for validation loss().'val_loss'
Stop training if the validation loss does not improve for 5 consecutive epochs.
Restores the best weights for the model during training.

Part 8: Training an Autoencoder

epochs = 20
batch_size = 128

history = autoencoder.fit(x_train_noisy, x_train, validation_data=(x_test_noisy, x_test),
                          epochs=epochs, batch_size=batch_size, callbacks=[early_stopping])

In this part, the autoencoder model is trained:

The number of epochs is set to 100 and the batch size is set to 128.
Training data to provide, validation data to provide.(x_train_noisy, x_train)(x_test_noisy, x_test)
The training process is executed with the specified number of epochs, batch size and early stopping callback.
The training history is stored in variables.history

Part 9: Denoise the test image and display the result

denoised_test_images = autoencoder.predict(x_test_noisy)

# Display original, noisy, and denoised images
n = 10
plt.figure(figsize=(20, 4))
for i in range(n):
    # Original images
    ax = plt.subplot(3, n, i + 1)
    plt.imshow(x_test[i].reshape(28, 28), cmap='gray')
    plt.title("Original")
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)

    # Noisy images
    ax = plt.subplot(3, n, i + 1 + n)
    plt.imshow(x_test_noisy[i].reshape(28, 28), cmap='gray')
    plt.title("Noisy")
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)

    # Denoised images
    ax = plt.subplot(3, n, i + 1 + n + n)
    plt.imshow(denoised_test_images[i].reshape(28, 28), cmap='gray')
    plt.title("Denoised")
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)
plt.show()

In this section:

Denoised test images are obtained by passing noisy test images through a trained autoencoder using this method.predict
Raw, noisy and denoised images use .matplotlib.pyplot
will create a graph with three lines to display the image.
For each row, create a subplot for each image.
Original, noisy, and denoised images are shown in separate subfigures.
Set axis labels and titles for each subplot.
The resulting plot uses the representation.plt.show()

4. Results

deep learning

autoencoder

dictionary

[Image Processing] Image Noise Reduction Using Autoencoder (Improved Version)