PyTorch Deep Learning Practice (25) - Autoencoder

PyTorch Deep Learning Practice (25) - Autoencoder

0. Preface

Autoencoder (Autoencoder) is an unsupervised learning neural network model used for feature extraction and dimensionality reduction of data. It consists of an encoder ( Encoder) and a decoder (Decoder), which compresses the input data into a low-dimensional representation and then reconstructs the original data. In this section, we will learn how to use autoencoders to represent images in a low-dimensional space. Learning to represent images in fewer dimensions helps to modify images, and low-dimensional representations can be used to generate new images.

1. Autoencoder

We have learned to classifyimages by training the model with input images and their corresponding labels. The prerequisite for classification is to have the A dataset with category labels. Assuming that there are no labels corresponding to images in the dataset, if images need to be clustered based on their similarity, in this case, the autoencoder can easily identify and group similar images.
The autoencoder takes an image as input, stores it in a low-dimensional space and tries to output the same image through the decoding process without using additional labels, so AutoEncoder Auto indicates that the input can be reproduced. However, if we simply need to reproduce the input in the output, there is no need for a neural network, and we can simply output the input as is. The role of an autoencoder is that it can encode image information in a lower dimension, so it is called an encoder (encodes image information into a lower dimensional space). Therefore, similar images have similar encodings. Furthermore, the decoder works to reconstruct the original image from the encoded vector to reproduce the input image as closely as possible:

autoencoder

Assume that the model input image is MNIST an image of handwritten digits, and the model output image is the same as the input image. The middlemost network layer is the encoding layer, also called the bottleneck layer (bottleneck layer). The operations that occur between the input and the bottleneck layer represent the encoder, and the operations between the bottleneck layer and the output represent the decoder.
Through the bottleneck layer, we can represent the image in a low-dimensional space and reconstruct the original image. In other words, using the bottleneck layer in the autoencoder can solve the problem of identifying similar images and generating new images. question, specifically:

  • Images with similar bottleneck layer values ​​(coded representations, also called latent codes) are likely to be similar to each other
  • By changing the node values ​​of the bottleneck layer, the output image can be changed.

2. Implement autoencoders using PyTorch

In this section, we use PyTorch to build the autoencoder. We use the MNIST data set to train this network, MNIST The data set is an image data set of handwritten digits, containing 6 10,000 28x28 pixels training samples and 1 10,000 test samples sample.

(1) Import the relevant libraries and define the device:

from torchvision.datasets import MNIST
from torchvision import transforms
from torch.utils.data import DataLoader, Dataset
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
from torchvision.utils import make_grid
import numpy as np
from matplotlib import pyplot as plt
device = 'cuda' if torch.cuda.is_available() else 'cpu'

(2) Specify image conversion method:

img_transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize([0.5], [0.5]),
    transforms.Lambda(lambda x: x.to(device))
])

With the above code, the image is converted into a tensor, normalized, and then passed to the device.

(3) Create training and validation datasets:

trn_ds = MNIST('MNIST/', transform=img_transform, train=True, download=True)
val_ds = MNIST('MNIST/', transform=img_transform, train=False, download=True)

(4) Fixed number addition wheel:

batch_size = 256
trn_dl = DataLoader(trn_ds, batch_size=batch_size, shuffle=True)
val_dl = DataLoader(val_ds, batch_size=batch_size, shuffle=False)

(5) defines the network architecture, in the __init__ method and , and print model summary information. AutoEncoderlatent_dimforward 方法

Definition AutoEncoder class and __init__ methods containing encoder, decoder, and bottleneck layer dimensions:

class AutoEncoder(nn.Module):
    def __init__(self, latent_dim):
        super().__init__()
        self.latend_dim = latent_dim
        self.encoder = nn.Sequential(
            nn.Linear(28 * 28, 128), nn.ReLU(True),
            nn.Linear(128, 64), nn.ReLU(True), 
            #nn.Linear(64, 12),  nn.ReLU(True), 
            nn.Linear(64, latent_dim))
        self.decoder = nn.Sequential(
            #nn.Linear(latent_dim, 12), nn.ReLU(True),
            nn.Linear(latent_dim, 64), nn.ReLU(True),
            nn.Linear(64, 128), nn.ReLU(True), 
            nn.Linear(128, 28 * 28), nn.Tanh())

Define forward calculation method forward:

    def forward(self, x):
        x = x.view(len(x), -1)
        x = self.encoder(x)
        x = self.decoder(x)
        x = x.view(len(x), 1, 28, 28)
        return x

Print model summary information:

from torchsummary import summary
model = AutoEncoder(3).to(device)
print(summary(model, (1,28,28)))

The model architecture information output is as follows:

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Linear-1                  [-1, 128]         100,480
              ReLU-2                  [-1, 128]               0
            Linear-3                   [-1, 64]           8,256
              ReLU-4                   [-1, 64]               0
            Linear-5                    [-1, 3]             195
            Linear-6                   [-1, 64]             256
              ReLU-7                   [-1, 64]               0
            Linear-8                  [-1, 128]           8,320
              ReLU-9                  [-1, 128]               0
           Linear-10                  [-1, 784]         101,136
             Tanh-11                  [-1, 784]               0
================================================================
Total params: 218,643
Trainable params: 218,643
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 0.02
Params size (MB): 0.83
Estimated Total Size (MB): 0.85
----------------------------------------------------------------

From the previous output, you can see that theLinear: 2-5 layer is the bottleneck layer, representing each image as a 3 dimensional vector; in addition, The decoder reconstructs the original image using the 3 dimensional vectors in the bottleneck layer.

(6) Define function to train the model on batch data train_batch():

def train_batch(input, model, criterion, optimizer):
    model.train()
    optimizer.zero_grad()
    output = model(input)
    loss = criterion(output, input)
    loss.backward()
    optimizer.step()
    return loss

(7) defines a function for model validation on batch data validate_batch():

@torch.no_grad()
def validate_batch(input, model, criterion):
    model.eval()
    output = model(input)
    loss = criterion(output, input)
    return loss

(8) Define the model, loss function and optimizer:

model = AutoEncoder(3).to(device)
criterion = nn.MSELoss()
optimizer = torch.optim.AdamW(model.parameters(), lr=0.001, weight_decay=1e-5)

(9) Learning model:

num_epochs = 20
train_loss_epochs = []
val_loss_epochs = []
for epoch in range(num_epochs):
    N = len(trn_dl)
    trn_loss = []
    val_loss = []
    for ix, (data, _) in enumerate(trn_dl):
        loss = train_batch(data, model, criterion, optimizer)
        pos = (epoch + (ix+1)/N)
        trn_loss.append(loss.item())
    train_loss_epochs.append(np.average(trn_loss))

    N = len(val_dl)
    for ix, (data, _) in enumerate(val_dl):
        loss = validate_batch(data, model, criterion)
        pos = epoch + (1+ix)/N
        val_loss.append(loss.item())
    val_loss_epochs.append(np.average(val_loss))

(10) Visualize the training and validation losses of the model over time during training:

epochs = np.arange(num_epochs)+1
plt.plot(epochs, train_loss_epochs, 'bo', label='Training loss')
plt.plot(epochs, val_loss_epochs, 'r-', label='Test loss')
plt.title('Training and Test loss over increasing epochs')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.grid('off')
plt.show()

Model monitoring
(11) Use the test data set val_ds to validate the model:

for _ in range(5):
    ix = np.random.randint(len(val_ds))
    im, _ = val_ds[ix]
    _im = model(im[None])[0]
    plt.subplot(121)
    # fig, ax = plt.subplots(1,2,figsize=(3,3)) 
    plt.imshow(im[0].detach().cpu(), cmap='gray')
    plt.title('input')
    plt.subplot(122)
    plt.imshow(_im[0].detach().cpu(), cmap='gray')
    plt.title('prediction')
plt.show()

Reconstruct image
We can see that even though the bottleneck layer only has three dimensions, the network can reproduce the input very accurately, but the image is not as clear as expected, mainly because the number of nodes in the bottleneck layer is too small. With different bottleneck layer sizes (2, 3, 5, 10 and After the network training of 50), the visually reconstructed image is as follows:

def train_aec(latent_dim):
    model = AutoEncoder(latent_dim).to(device)
    criterion = nn.MSELoss()
    optimizer = torch.optim.AdamW(model.parameters(), lr=0.001, weight_decay=1e-5)

    num_epochs = 20
    train_loss_epochs = []
    val_loss_epochs = []

    for epoch in range(num_epochs):
        N = len(trn_dl)
        trn_loss = []
        val_loss = []
        for ix, (data, _) in enumerate(trn_dl):
            loss = train_batch(data, model, criterion, optimizer)
            pos = (epoch + (ix+1)/N)
            trn_loss.append(loss.item())
        train_loss_epochs.append(np.average(trn_loss))

        N = len(val_dl)
        trn_loss = []
        val_loss = []
        for ix, (data, _) in enumerate(val_dl):
            loss = validate_batch(data, model, criterion)
            pos = epoch + (1+ix)/N
            val_loss.append(loss.item())
        val_loss_epochs.append(np.average(val_loss))
    epochs = np.arange(num_epochs)+1
    plt.plot(epochs, train_loss_epochs, 'bo', label='Training loss')
    plt.plot(epochs, val_loss_epochs, 'r-', label='Test loss')
    plt.title('Training and Test loss over increasing epochs')
    plt.xlabel('Epochs')
    plt.ylabel('Loss')
    plt.legend()
    plt.grid('off')
    plt.show()
    return model

aecs = [train_aec(dim) for dim in [50, 2, 3, 5, 10]]

for _ in range(10):
    ix = np.random.randint(len(val_ds))
    im, _ = val_ds[ix]
    plt.subplot(1, len(aecs)+1, 1)
    plt.imshow(im[0].detach().cpu(), cmap='gray')
    plt.title('input')
    idx = 2
    for model in aecs:
        _im = model(im[None])[0]
        plt.subplot(1, len(aecs)+1, idx)
        plt.imshow(_im[0].detach().cpu(), cmap='gray')
        plt.title(f'prediction\nlatent-dim:{
      
      model.latend_dim}')
        idx += 1
plt.show()

Please add image description

As the vector dimension in the bottleneck layer increases, the clarity of the reconstructed image gradually improves.

summary

The autoencoder is an unsupervised learning neural network model used for feature extraction and dimensionality reduction of data. It consists of an encoder and a decoder, which achieves feature extraction and data dimensionality reduction by compressing the input data into a low-dimensional representation and trying to reconstruct the original data. During the training process of the autoencoder, the goal is to minimize the reconstruction error between the input data and the reconstructed data so that the encoder captures the key features of the data. Autoencoders play an important role in unsupervised learning and deep learning, capable of learning useful features from data and providing support for subsequent machine learning tasks.

Series link

PyTorch Deep Learning Practice (1) - Detailed explanation of the neural network and model training process
PyTorch Deep Learning Practice (2) - PyTorch Basics
PyTorch Deep Learning Practice (3) - Use PyTorch to build neural networks
PyTorch Deep Learning Practice (4) - Detailed explanation of commonly used activation functions and loss functions
PyTorch Depth Practical Learning (5) - Basics of Computer Vision
PyTorch Deep Learning Practical (6) - Neural Network Performance Optimization Technology
PyTorch Deep Learning Practical (7) - —The impact of batch size on neural network training
PyTorch Deep Learning Practice (8)—Batch Normalization
PyTorch Deep Learning Practice (9)—Learning Rate Optimization
PyTorch Deep Learning Practice (10) - Overfitting and its Solution
PyTorch Deep Learning Practice (11) - Convolutional Neural Network< /span> a>PyTorch deep learning practice (24)—— Implementing Mask R-CNN instance segmentation from scratchPyTorch deep learning practice (23)——Using U-Net architecture for image segmentationPyTorch deep learning practice (22) - from Implementing YOLO target detection from scratchPyTorch deep learning practice (21) - Implementing Faster R-CNN target detection from scratchPyTorch Deep Learning Practice (20) - Implementing Fast R- from scratch CNN target detectionPyTorch Deep Learning Practice (19) - Implementing R-CNN target detection from scratchPyTorch Deep Learning Practice (18)—Basics of Target DetectionPyTorch Deep Learning Practice (17)—Multi-task LearningPyTorch Deep Learning in Practice (16) ——Facial Key Point DetectionPyTorch Deep Learning in Practice (15) - Transfer LearningPyTorch Deep Learning in Practice (14) - Class Activation DiagramPyTorch Deep Learning Practice (13) - Visualizing the Output of the Middle Layer of the Neural Network
PyTorch Deep Learning Practice (12) - Data Enhancement











Guess you like

Origin blog.csdn.net/LOVEmy134611/article/details/132133874