PyTorch Deep Learning Practice (25) - Autoencoder
0. Preface
Autoencoder (Autoencoder
) is an unsupervised learning neural network model used for feature extraction and dimensionality reduction of data. It consists of an encoder ( Encoder
) and a decoder (Decoder
), which compresses the input data into a low-dimensional representation and then reconstructs the original data. In this section, we will learn how to use autoencoders to represent images in a low-dimensional space. Learning to represent images in fewer dimensions helps to modify images, and low-dimensional representations can be used to generate new images.
1. Autoencoder
We have learned to classifyimages by training the model with input images and their corresponding labels. The prerequisite for classification is to have the A dataset with category labels. Assuming that there are no labels corresponding to images in the dataset, if images need to be clustered based on their similarity, in this case, the autoencoder can easily identify and group similar images.
The autoencoder takes an image as input, stores it in a low-dimensional space and tries to output the same image through the decoding process without using additional labels, so AutoEncoder
Auto
indicates that the input can be reproduced. However, if we simply need to reproduce the input in the output, there is no need for a neural network, and we can simply output the input as is. The role of an autoencoder is that it can encode image information in a lower dimension, so it is called an encoder (encodes image information into a lower dimensional space). Therefore, similar images have similar encodings. Furthermore, the decoder works to reconstruct the original image from the encoded vector to reproduce the input image as closely as possible:
Assume that the model input image is MNIST
an image of handwritten digits, and the model output image is the same as the input image. The middlemost network layer is the encoding layer, also called the bottleneck layer (bottleneck layer
). The operations that occur between the input and the bottleneck layer represent the encoder, and the operations between the bottleneck layer and the output represent the decoder.
Through the bottleneck layer, we can represent the image in a low-dimensional space and reconstruct the original image. In other words, using the bottleneck layer in the autoencoder can solve the problem of identifying similar images and generating new images. question, specifically:
- Images with similar bottleneck layer values (coded representations, also called latent codes) are likely to be similar to each other
- By changing the node values of the bottleneck layer, the output image can be changed.
2. Implement autoencoders using PyTorch
In this section, we use PyTorch
to build the autoencoder. We use the MNIST
data set to train this network, MNIST
The data set is an image data set of handwritten digits, containing 6
10,000 28x28
pixels training samples and 1
10,000 test samples sample.
(1) Import the relevant libraries and define the device:
from torchvision.datasets import MNIST
from torchvision import transforms
from torch.utils.data import DataLoader, Dataset
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
from torchvision.utils import make_grid
import numpy as np
from matplotlib import pyplot as plt
device = 'cuda' if torch.cuda.is_available() else 'cpu'
(2) Specify image conversion method:
img_transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize([0.5], [0.5]),
transforms.Lambda(lambda x: x.to(device))
])
With the above code, the image is converted into a tensor, normalized, and then passed to the device.
(3) Create training and validation datasets:
trn_ds = MNIST('MNIST/', transform=img_transform, train=True, download=True)
val_ds = MNIST('MNIST/', transform=img_transform, train=False, download=True)
(4) Fixed number addition wheel:
batch_size = 256
trn_dl = DataLoader(trn_ds, batch_size=batch_size, shuffle=True)
val_dl = DataLoader(val_ds, batch_size=batch_size, shuffle=False)
(5) defines the network architecture, in the __init__
method and , and print model summary information. AutoEncoder
latent_dim
forward 方法
Definition AutoEncoder
class and __init__
methods containing encoder, decoder, and bottleneck layer dimensions:
class AutoEncoder(nn.Module):
def __init__(self, latent_dim):
super().__init__()
self.latend_dim = latent_dim
self.encoder = nn.Sequential(
nn.Linear(28 * 28, 128), nn.ReLU(True),
nn.Linear(128, 64), nn.ReLU(True),
#nn.Linear(64, 12), nn.ReLU(True),
nn.Linear(64, latent_dim))
self.decoder = nn.Sequential(
#nn.Linear(latent_dim, 12), nn.ReLU(True),
nn.Linear(latent_dim, 64), nn.ReLU(True),
nn.Linear(64, 128), nn.ReLU(True),
nn.Linear(128, 28 * 28), nn.Tanh())
Define forward calculation method forward
:
def forward(self, x):
x = x.view(len(x), -1)
x = self.encoder(x)
x = self.decoder(x)
x = x.view(len(x), 1, 28, 28)
return x
Print model summary information:
from torchsummary import summary
model = AutoEncoder(3).to(device)
print(summary(model, (1,28,28)))
The model architecture information output is as follows:
----------------------------------------------------------------
Layer (type) Output Shape Param #
================================================================
Linear-1 [-1, 128] 100,480
ReLU-2 [-1, 128] 0
Linear-3 [-1, 64] 8,256
ReLU-4 [-1, 64] 0
Linear-5 [-1, 3] 195
Linear-6 [-1, 64] 256
ReLU-7 [-1, 64] 0
Linear-8 [-1, 128] 8,320
ReLU-9 [-1, 128] 0
Linear-10 [-1, 784] 101,136
Tanh-11 [-1, 784] 0
================================================================
Total params: 218,643
Trainable params: 218,643
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 0.02
Params size (MB): 0.83
Estimated Total Size (MB): 0.85
----------------------------------------------------------------
From the previous output, you can see that theLinear: 2-5
layer is the bottleneck layer, representing each image as a 3
dimensional vector; in addition, The decoder reconstructs the original image using the 3
dimensional vectors in the bottleneck layer.
(6) Define function to train the model on batch data train_batch()
:
def train_batch(input, model, criterion, optimizer):
model.train()
optimizer.zero_grad()
output = model(input)
loss = criterion(output, input)
loss.backward()
optimizer.step()
return loss
(7) defines a function for model validation on batch data validate_batch()
:
@torch.no_grad()
def validate_batch(input, model, criterion):
model.eval()
output = model(input)
loss = criterion(output, input)
return loss
(8) Define the model, loss function and optimizer:
model = AutoEncoder(3).to(device)
criterion = nn.MSELoss()
optimizer = torch.optim.AdamW(model.parameters(), lr=0.001, weight_decay=1e-5)
(9) Learning model:
num_epochs = 20
train_loss_epochs = []
val_loss_epochs = []
for epoch in range(num_epochs):
N = len(trn_dl)
trn_loss = []
val_loss = []
for ix, (data, _) in enumerate(trn_dl):
loss = train_batch(data, model, criterion, optimizer)
pos = (epoch + (ix+1)/N)
trn_loss.append(loss.item())
train_loss_epochs.append(np.average(trn_loss))
N = len(val_dl)
for ix, (data, _) in enumerate(val_dl):
loss = validate_batch(data, model, criterion)
pos = epoch + (1+ix)/N
val_loss.append(loss.item())
val_loss_epochs.append(np.average(val_loss))
(10) Visualize the training and validation losses of the model over time during training:
epochs = np.arange(num_epochs)+1
plt.plot(epochs, train_loss_epochs, 'bo', label='Training loss')
plt.plot(epochs, val_loss_epochs, 'r-', label='Test loss')
plt.title('Training and Test loss over increasing epochs')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.grid('off')
plt.show()
(11) Use the test data set val_ds
to validate the model:
for _ in range(5):
ix = np.random.randint(len(val_ds))
im, _ = val_ds[ix]
_im = model(im[None])[0]
plt.subplot(121)
# fig, ax = plt.subplots(1,2,figsize=(3,3))
plt.imshow(im[0].detach().cpu(), cmap='gray')
plt.title('input')
plt.subplot(122)
plt.imshow(_im[0].detach().cpu(), cmap='gray')
plt.title('prediction')
plt.show()
We can see that even though the bottleneck layer only has three dimensions, the network can reproduce the input very accurately, but the image is not as clear as expected, mainly because the number of nodes in the bottleneck layer is too small. With different bottleneck layer sizes (2
, 3
, 5
, 10
and After the network training of 50
), the visually reconstructed image is as follows:
def train_aec(latent_dim):
model = AutoEncoder(latent_dim).to(device)
criterion = nn.MSELoss()
optimizer = torch.optim.AdamW(model.parameters(), lr=0.001, weight_decay=1e-5)
num_epochs = 20
train_loss_epochs = []
val_loss_epochs = []
for epoch in range(num_epochs):
N = len(trn_dl)
trn_loss = []
val_loss = []
for ix, (data, _) in enumerate(trn_dl):
loss = train_batch(data, model, criterion, optimizer)
pos = (epoch + (ix+1)/N)
trn_loss.append(loss.item())
train_loss_epochs.append(np.average(trn_loss))
N = len(val_dl)
trn_loss = []
val_loss = []
for ix, (data, _) in enumerate(val_dl):
loss = validate_batch(data, model, criterion)
pos = epoch + (1+ix)/N
val_loss.append(loss.item())
val_loss_epochs.append(np.average(val_loss))
epochs = np.arange(num_epochs)+1
plt.plot(epochs, train_loss_epochs, 'bo', label='Training loss')
plt.plot(epochs, val_loss_epochs, 'r-', label='Test loss')
plt.title('Training and Test loss over increasing epochs')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.grid('off')
plt.show()
return model
aecs = [train_aec(dim) for dim in [50, 2, 3, 5, 10]]
for _ in range(10):
ix = np.random.randint(len(val_ds))
im, _ = val_ds[ix]
plt.subplot(1, len(aecs)+1, 1)
plt.imshow(im[0].detach().cpu(), cmap='gray')
plt.title('input')
idx = 2
for model in aecs:
_im = model(im[None])[0]
plt.subplot(1, len(aecs)+1, idx)
plt.imshow(_im[0].detach().cpu(), cmap='gray')
plt.title(f'prediction\nlatent-dim:{
model.latend_dim}')
idx += 1
plt.show()
As the vector dimension in the bottleneck layer increases, the clarity of the reconstructed image gradually improves.
summary
The autoencoder is an unsupervised learning neural network model used for feature extraction and dimensionality reduction of data. It consists of an encoder and a decoder, which achieves feature extraction and data dimensionality reduction by compressing the input data into a low-dimensional representation and trying to reconstruct the original data. During the training process of the autoencoder, the goal is to minimize the reconstruction error between the input data and the reconstructed data so that the encoder captures the key features of the data. Autoencoders play an important role in unsupervised learning and deep learning, capable of learning useful features from data and providing support for subsequent machine learning tasks.
Series link
PyTorch Deep Learning Practice (1) - Detailed explanation of the neural network and model training process
PyTorch Deep Learning Practice (2) - PyTorch Basics
PyTorch Deep Learning Practice (3) - Use PyTorch to build neural networks
PyTorch Deep Learning Practice (4) - Detailed explanation of commonly used activation functions and loss functions
PyTorch Depth Practical Learning (5) - Basics of Computer Vision
PyTorch Deep Learning Practical (6) - Neural Network Performance Optimization Technology
PyTorch Deep Learning Practical (7) - —The impact of batch size on neural network training
PyTorch Deep Learning Practice (8)—Batch Normalization
PyTorch Deep Learning Practice (9)—Learning Rate Optimization
PyTorch Deep Learning Practice (10) - Overfitting and its Solution
PyTorch Deep Learning Practice (11) - Convolutional Neural Network< /span> a>PyTorch deep learning practice (24)—— Implementing Mask R-CNN instance segmentation from scratchPyTorch deep learning practice (23)——Using U-Net architecture for image segmentationPyTorch deep learning practice (22) - from Implementing YOLO target detection from scratchPyTorch deep learning practice (21) - Implementing Faster R-CNN target detection from scratchPyTorch Deep Learning Practice (20) - Implementing Fast R- from scratch CNN target detectionPyTorch Deep Learning Practice (19) - Implementing R-CNN target detection from scratchPyTorch Deep Learning Practice (18)—Basics of Target DetectionPyTorch Deep Learning Practice (17)—Multi-task LearningPyTorch Deep Learning in Practice (16) ——Facial Key Point DetectionPyTorch Deep Learning in Practice (15) - Transfer LearningPyTorch Deep Learning in Practice (14) - Class Activation DiagramPyTorch Deep Learning Practice (13) - Visualizing the Output of the Middle Layer of the Neural Network
PyTorch Deep Learning Practice (12) - Data Enhancement