Autoencoder basic principle and model realization

Autoencoder is a kind of embedding and representation learning. As a kind of method of deep neural network, it is mainly used for data dimensionality reduction, compression, and obtaining low-dimensional representations. Autoencoders have the same effect as dimensionality reduction methods such as Principal Component Analysis (PCA) and Factor Analysis (FA) in traditional machine learning, but they are more flexible and often have better results.

One: Basic principles

For an autoencoder, its structure can be divided into two parts: an encoder (encoder) and a decoder (decoder). The encoder part can be a neural network composed of layers such as convolution, pooling, and full connection (in order to reduce the dimension, convolution generally adopts downsampling, or converts the matrix into a 1-dimensional tensor, and then performs full connection, full connection The number of layer neuron nodes gradually decreases layer by layer), and the output dimension will be much smaller than the input (for example, a 500-500 $\times$ picture can get $\times$ a sequence of 150 through the encoder, and the 10-dimensional data becomes a smaller dimension); decoder part, its structure can be a combination of convolution, full connection and other layers (in order to increase the dimension, convolution usually adopts upsampling, or the number of neuron nodes in the fully connected layer gradually increases layer by layer), the decoder part will output the encoder part As input, the final output has exactly the same dimensions as the input. The key to network training is to make the input and output as identical as possible (the loss is as small as possible), so the loss function is the data and the output of the data through the autoencoder. After the training is over, a certain data is input into the autoencoder, and the output embedding of the Encoder is the representation of the data.

The above is the architecture of the basic autoencoder.

Two: Model realization

Tutu here takes the molecular picture data created by rdkit as an example (readers can also choose other pictures for training, and first define a model at will to try).

import numpy as np
import torch
from torch import nn
from torch.utils.data import DataLoader
from rdkit import Chem
from rdkit.Chem import Draw
from PIL import Image

class dataset:
    '''训练数据集'''
    def __init__(self):
        smis=['C=C','C(=O)C','CNC','CCC(=O)C',
              'C1=CC=CC=C1','C#CC','O=C=O','CCCO',
              'N#N','C=CC=CO','NC(=O)C','OCCOCC']
        '''创建分子图片数据集，保存在Data文件夹中'''
        for i in range(len(smis)):
            mol=Chem.MolFromSmiles(smis[i])
            img=Draw.MolToImage(mol,size=(50,50))
            img.save('Data/img{}.png'.format(i))
        traindata=[]
        for i in range(len(smis)):
            '''将图片转成50x50x3 张量'''
            traindata.append(np.array(Image.open('Data/img{}.png'.format(i))))
        self.traindata=torch.tensor(np.array(traindata),dtype=torch.float32)
        self.n=len(smis)
    def __len__(self):
        return self.n
    def __getitem__(self, item):
        return self.traindata[item]

class Autuencoder(nn.Module):
    '''自编码器'''
    def __init__(self):
        super().__init__()
        self.encode=nn.Sequential(nn.Conv2d(in_channels=3,out_channels=1,kernel_size=(5,5),stride=1,padding=0),
                                  nn.ReLU(),
                                  nn.MaxPool2d(kernel_size=(5,5),stride=1,padding=0),
                                  nn.Conv2d(in_channels=1,out_channels=3,kernel_size=(5,5),stride=1,padding=0),
                                  nn.ReLU(),
                                  nn.MaxPool2d(kernel_size=(5,5),stride=1,padding=0),
                                  nn.Flatten(start_dim=2,end_dim=3),
                                  nn.Linear(1156,1000),
                                  nn.ReLU(),
                                  nn.Linear(1000,800),
                                  nn.ReLU())
        self.decode=nn.Sequential(nn.Linear(800,2000),
                                  nn.ReLU(),
                                  nn.Linear(2000,2500))

    def forward(self,input):
        out=self.encode(input)
        out=self.decode(out)
        b,c,_=out.shape
        out=out.view(b,c,50,50)
        return out

if __name__=='__main__':
    epochs=30
    batch_size=2
    dataloder=DataLoader(dataset(),shuffle=True,batch_size=batch_size) #加载数据
    auto=Autuencoder()
    optim=torch.optim.Adam(params=auto.parameters())
    Loss=nn.MSELoss() #损失函数
    for i in range(epochs):
        for data in dataloder:
            data=data.permute(0,3,1,2)
            yp=auto(data)
            loss=Loss(yp,data)
            optim.zero_grad()
            loss.backward()
            optim.step()
    torch.save(auto,'autoencoder.pkl') #保存模型

The above model structure is shown below.

Due to the computer computing power and memory of Tutu, the dimensions of the picture and the number of data sets cannot be too large, so the above data and models are selected.

After training, we take img9 as an example.

import matplotlib .pyplot as plt
from ex1 import Autuencoder
from PIL import Image
import torch
import numpy as np

auto=torch.load('autoencoder.pkl')
input=np.array(Image.open('Data/img9.png'))
input=torch.tensor(np.array([input]),dtype=torch.float32).permute(0,3,1,2)
out=auto(input)
out=out.permute(0,2,3,1)[0]
out=out.detach().numpy()
img=Image.fromarray(out.astype('uint8'))
plt.imshow(img)
plt.show()

The self-encoder trained by the input of img9 data will get the output as shown in the figure below.

The ideal situation is that the input and output pictures are the same, and there is obviously not enough training here (or there are certain problems with the model and its parameters, maybe not using convolution and only using full connection will train faster and have better results for such small sample data ).

After training the self-encoder, we can also get the compressed coded representation of this picture.

auto=torch.load('autoencoder.pkl')
input=np.array(Image.open('Data/img9.png'))
input=torch.tensor(np.array([input]),dtype=torch.float32).permute(0,3,1,2)
out=auto.encode(input)
print(out)

The encoding here may not be a one-dimensional tensor, which is feasible in practical applications.

Imagine:

For some molecular databases, we obtain images of molecules from them (or process the acquired files to obtain molecular images), and use autoencoders to learn the representation of the molecules (molecular descriptors). Is this feasible? The answer is no, because the same molecule can have many different images—or translation, rotation, and other molecular structural drawing methods can make the molecular images different, but they are the same molecule, then the encoder gets The encoding of can only represent the picture, not the molecule. For these molecular structures, methods such as graph neural network or structure-based point cloud neural network are still used for processing.

Three: Summary

As an embedding and representation method, autoencoder has a wide range of applications in practical applications. It can effectively reduce the dimensionality of data, learn representations of data, and use these representations to further solve supervised learning problems.