Amazon SageMaker uses autoencoders to complete face generation

I. Introduction

I was recently invited to participate in the Amazon Cloud Technology [Cloud Discovery Lab] event, where I experienced the Amazon SageMaker platform and trained a face autoencoder. Comparing the local training time, the speed improvement is obvious. This article will focus on Amazon SageMaker and autoencoders.

The autoencoder is a very simple network. The concept of autoencoders was proposed as early as the 1990s. At that time, restricted Boltzmann machine hierarchical training was used, but today with powerful hardware, end-to-end training can be achieved. There are many variants of autoencoders, such as variational autoencoders, denoising autoencoders, regular autoencoders, etc. Since the autoencoder uses self-supervised learning, using the autoencoder can achieve better results at a lower cost.

Today we will train an autoencoder. After the autoencoder training is completed, various interesting experiments can be performed, such as editing a person's expression, making face A gradually turn into face B, making a person grow older from childhood, generating human faces, etc.

In this article we will do some experiments to implement face gradient and face generation operations.

1.1 What is Amazon SageMaker

Amazon SageMaker is a fully managed machine learning service platform that includes every process of machine learning, from labeling to deployment. Developers can quickly build and train models, and deploy them to hosted environments. Amazon SageMaker provides Jupyter notebooks and can execute various popular frameworks, not only MXNet, but also mainstream frameworks such as PyTorch and TensorFlow.

1.2 What’s special

When labeling data, Ground Truth can be used for team labeling. After labeling a certain amount of data, Ground Truth can automatically label. When the labeling is uncertain, manual labeling will be performed.

Amazon SageMaker provides data storage, model deployment and other services, and these operations can be completed with one click.

At the same time, Amazon SageMaker provides many high-level APIs, and Amazon Auto Pilot can be used to automatically train and tune models. At the same time, the condition of the model can also be monitored to better improve the model.

2. Machine learning process

2.1 Overall machine learning process

Before starting to use Amazon SageMaker to complete machine learning tasks, first become familiar with the machine learning process. The machine learning process is divided into the following steps:

  1. retrieve data
  2. Data cleaning
  3. Process the data into the input of the model
  4. Training model
  5. Evaluation model
  6. Deployment model
  7. Monitor, collect data, evaluate models

The above steps are generally a straight line, but you often go back to the previous steps and execute them again. As shown below:
Insert image description here

(1) Data processing

1-3 is the data processing part, which mainly involves the creation of data from scratch, from chaos to standardization. In this step, we will use technologies such as crawlers, regularization, normalization, and standardization. The processed data is usually represented as a tensor (multidimensional array), and the shape varies for different types of data. Image data usually has n_sample × channel × height × widththe meaning of each dimension: number of samples, number of channels, height, and width. Sometimes the number of channels is placed in the last dimension. The tabular data is processed into n_sample × n_feature, respectively, the number of samples and the number of features. The text data is n_sample × sequence_len × 1, respectively, the number of samples and the length of the text. There are also data such as time series, video, and stereoscopic pictures that will be processed into fixed-shaped tensors.

After the data is processed into tensors, some feature engineering can also be done, such as feature selection, standardization, normalization, etc. This step is beneficial to model training.

(2) Training model

4-5 is the training and evaluation part. After processing the data, you can start training the model. Training the model requires us to determine several things, including the model structure (selecting the above algorithm), corresponding hyperparameters, optimizers, etc. In this process, multiple models are usually trained, and an optimal model is selected as the final model.

In order to select optimal hyperparameters, grid search cross-validation is used. In the sklearn module, the corresponding implementation is provided. SageMaker also has similar and stronger functions, and I will provide detailed experience later.

(3) Deployment and maintenance

After selecting an optimal model, the model can be deployed online. The model can be deployed as API, Web, mini program, APP, etc. After deployment and launch, the model may encounter various problems and slowly fall behind, so the model also needs to be monitored and maintained. After going online, you can continue to collect some user-authorized data, and then repeat the previous steps with this data to iteratively optimize the model.

2.2 Machine learning process in SageMaker

The machine learning process in SageMaker is the same as above. Let's actually see how each step is performed.

(1) Data processing

Jupyter Notebook can be created in Amazon SageMaker, and the created Notebook can execute instructions such as pip and wget. We can process data using all the methods we have used in the past, or we can use the SDK that comes with SageMaker. There is a sklearn submodule in SageMaker that can be used to process data.

SKLearnProcessor can execute sklearn scripts to process data. First, you need to create an SKLearnProcessor object, and then call the run method to process the data. The sample code is as follows:

from sagemaker import get_execution_role
from sagemaker.sklearn.processing import SKLearnProcessor
from sagemaker.processing import ProcessingInput, ProcessingOutput

role = get_execution_role()
sklearn_processor = SKLearnProcessor(
    framework_version="0.20.0", role=role, instance_type="ml.t3.medium", instance_count=1
)
sklearn_processor.run(
    code="preprocessing.py",
    inputs=[
        ProcessingInput(source="s3://your-bucket/path/to/your/data", destination="/opt/ml/processing/input"),
    ],
    outputs=[
        ProcessingOutput(output_name="train_data", source="/opt/ml/processing/train"),
        ProcessingOutput(output_name="test_data", source="/opt/ml/processing/test"),
    ],
    arguments=["--train-test-split-ratio", "0.2"],
)
preprocessing_job_description = sklearn_processor.jobs[-1].describe()

When creating and calling the run method, you need to first create preprocessing.pya file for processing data, corresponding to the code parameter. preprocessing.pyCommand-line arguments are given by arguments.

preprocessing.pyYou can use sklearn to do specific processing. After the processing is completed, the processing results and other information are saved in preprocessing_job_description, and the processing results can be obtained through preprocessing_job_description['Outputs'].

(2) Training model

Model training is the same as data processing, and the corresponding training script needs to be prepared train.py. After processing the data and preparing train.pythe script, you can use the following code to train. The instance type here must be consistent with the instance type for data processing.

from sagemaker.sklearn.estimator import SKLearn

sklearn = SKLearn(
    entry_point="train.py", framework_version="0.20.0", instance_type="ml.t3.medium", role=role
)
sklearn.fit({
    
    "train": preprocessed_training_data})
training_job_description = sklearn.jobs[-1].describe()
model_data_s3_uri = "{}{}/{}".format(
    training_job_description["OutputDataConfig"]["S3OutputPath"],
    training_job_description["TrainingJobName"],
    "output/model.tar.gz",
)

The code to evaluate the model is also in the same style:

sklearn_processor.run(
    code="evaluation.py",
    inputs=[
        ProcessingInput(source=model_data_s3_uri, destination="/opt/ml/processing/model"),
        ProcessingInput(source=preprocessed_test_data, destination="/opt/ml/processing/test"),
    ],
    outputs=[ProcessingOutput(output_name="evaluation", source="/opt/ml/processing/evaluation")],
)
evaluation_job_description = sklearn_processor.jobs[-1].describe()

(3) Deployment

After building and training the model, you can deploy the model to an endpoint to obtain predictive inference results. Deployment can be done with the following code:

predictor = estimator.deploy(initial_instance_count=1, instance_type="ml.m4.xlarge")

The instance type can be selected according to the task requirements.

2.3 Actual combat

3. Autoencoder

The following uses SageMaker to complete experiments related to self-coding.

3.1 Introduction to autoencoders

An autoencoder is a very simple network, usually consisting of an encoder and a decoder. The structure of the encoder-decoder can be fully connected, convolutional, or some other network. In the early days of autoencoders, the encoder and decoder needed to be trained separately, but now they are usually trained end-to-end.

The encoder part will gradually reduce the dimensionality of the input, and finally obtain a fixed-length vector, which can be used as the encoding of the input data. The decoder part receives the output of the encoder and as a result the decoder will get a data with the same shape as the encoder input. The purpose of autoencoder training is to output as close as possible to the data.

Overall, the autoencoder uses a supervised learning method, but the target value and feature value are the same. This learning method in which labels are given by the data itself is also called self-supervised learning.

Autoencoders have a simple structure but have some very good properties. For example, training is simple and does not require manual labeling of data. Autoencoders can be used to reduce the dimensionality of data, create new data, etc. Assuming that face image data is encoded, the encoder will get a 1024-dimensional vector. Then each dimension of the vector obtained by the encoder may represent a feature, for example, the n-th dimension may represent expression, the k-th dimension may represent gender, etc. If you know this information, you can do some interesting things.

3.2 Environment preparation

Here we use Amazon SageMaker's notebook instance for experiments, create a notebook instance, and use the default options when creating it. You need to remember the instance type used when creating it, and subsequent training needs to correspond to the correct type.

After creating the environment, you can run the following code in your notebook to obtain the current role and S3 bucket:

import sagemaker
import os
sess = sagemaker.Session()
role = sagemaker.get_execution_role()
bucket = sagemaker_session.default_bucket()

Now the data preparation can start.

3.3 Data processing

PyTorch is used here to complete data processing and training. Here are some modules that need to be used:

from torch import nn
from torch import optim
from torch.utils.data import Dataset, DataLoader
from torchvision import datasets
from torchvision import transforms

This experiment uses the CelebA Dataset, which can be downloaded at http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html . The data set contains 100,000 corrected face images. After downloading the images to your computer, upload the data to your notebook. In order to facilitate loading, create a Dataset class to complete the loading of data:

class FaceDataset(Dataset):
    def __init__(self, data_dir="./datasets/img_align_celeba", image_size=64):
        self.image_size = image_size
        self.trans = transforms.Compose([
            transforms.Resize(image_size),
            transforms.CenterCrop(image_size),
            transforms.ToTensor(),
            transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
        ])
        self.image_paths = []
        if os.path.isdir(data_dir):
            for root, dirs, files in os.walk(data_dir):
                for file in files:
                    self.image_paths.append(os.path.join(root, file))
        elif os.path.isfile(data_dir):
            with open(data_dir, encoding='utf-8') as f:
                self.image_paths = f.read().strip().split()

    def __getitem__(self, item):
        return self.trans(Image.open(self.image_paths[item]))

    def __len__(self):
        return len(self.image_paths)

Create a DataLoader below to load data:

dataset = FaceDataset(data_dir="./datasets/img_align_celeba")
dataloader = DataLoader(dataset, 64)

In addition, you can upload data to the S3 bucket through the following code:

inputs = sagemaker_session.upload_data(path="datasets/img_align_celeba", bucket=bucket, key_prefix='sagemaker/img_align_celeba')

After the data is prepared, you need to write the model and training script.

3.2 Model training

This article uses face data to train an autoencoder, and then uses this autoencoder to do other things. First, train an autoencoder. The network structure consists of convolution and transposed convolution. The code is as follows:

class FaceAutoEncoder(nn.Module):
    def __init__(self, encoded_dim=1024):
        super(FaceAutoEncoder, self).__init__()
        # [b, 3, 64, 64] --> [b, 1024, 1, 1]
        self.encoder = nn.Sequential(
            nn.Conv2d(3, 64, 4, 2, 1, bias=False),
            nn.LeakyReLU(0.2, inplace=True),

            nn.Conv2d(64, 64 * 2, 4, 2, 1, bias=False),
            nn.BatchNorm2d(64 * 2),
            nn.LeakyReLU(0.2, inplace=True),

            nn.Conv2d(64 * 2, 64 * 4, 4, 2, 1, bias=False),
            nn.BatchNorm2d(64 * 4),
            nn.LeakyReLU(0.2, inplace=True),

            nn.Conv2d(64 * 4, 64 * 8, 4, 2, 1, bias=False),
            nn.BatchNorm2d(64 * 8),
            nn.LeakyReLU(0.2, inplace=True),

            nn.Conv2d(64 * 8, encoded_dim, 4, 1, 0, bias=False),
            nn.LeakyReLU(0.2, inplace=True)
        )
        # [b, 1024, 1, 1] - > [b, 3, 64, 64]
        self.decoder = nn.Sequential(
            nn.ConvTranspose2d(encoded_dim, 64 * 8, 4, 1, 0, bias=False),
            nn.BatchNorm2d(64 * 8),
            nn.ReLU(True),

            nn.ConvTranspose2d(64 * 8, 64 * 4, 4, 2, 1, bias=False),
            nn.BatchNorm2d(64 * 4),
            nn.ReLU(True),

            nn.ConvTranspose2d(64 * 4, 64 * 2, 4, 2, 1, bias=False),
            nn.BatchNorm2d(64 * 2),
            nn.ReLU(True),

            nn.ConvTranspose2d(64 * 2, 64, 4, 2, 1, bias=False),
            nn.BatchNorm2d(64),
            nn.ReLU(True),
            nn.ConvTranspose2d(64, 3, 4, 2, 1, bias=True),
            nn.Tanh()
        )

    def forward(self, x):
        x = self.encoder(x)
        x = self.decoder(x)
        return x

The encoder part converts the 64×64 picture into a 1024-dimensional vector, while the decoder uses transposed convolution to convert the 1024-dimensional vector into a 64×64 image. Now our goal is that the model input is as close to the data as possible, so we can use the mean square error as the loss function, and the input and target are the same batch of data. Write a training script below autoencoder.py, the code is as follows:

def train(args):
    device = "cuda" if torch.cuda.is_available() else "cpu"
    dataset = FaceDataset()
    dataloader = DataLoader(dataset, 64)
    model = FaceAutoEncoder().to(device)
    optimizer = optim.SGD(model.parameters(), lr=args.lr, momentum=0.9)

    for epoch in range(1, args.epochs + 1):
        model.train()
        for batch_idx, data in enumerate(dataloader, 1):
            data = data.to(device)
            optimizer.zero_grad()
            output = model(data)
            loss = F.mse_loss(output, data)
            loss.backward()
            optimizer.step()
    save_model(model, args.model_dir)


def model_fn(model_dir):
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    model = torch.nn.DataParallel(AutoEncoder())
    with open(os.path.join(model_dir, "model.pth"), "rb") as f:
        model.load_state_dict(torch.load(f))
    return model.to(device)


def save_model(model, model_dir):
    path = os.path.join(model_dir, "model.pth")
    torch.save(model.decoder.cpu().state_dict(), path)

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument(
        "--epochs",
        type=int,
        default=10,
        metavar="N",
        help="number of epochs to train (default: 10)",
    )
    parser.add_argument(
        "--lr", type=float, default=0.01, metavar="LR", help="learning rate (default: 0.01)"
    )
    train(parser.parse_args())

We can set some parameters using argparse. After the script is ready, you can start training. Because PyTorch is used, you need to use PyTorch in sagemaker.pytorch for training. The code is as follows:

from sagemaker.pytorch import PyTorch

estimator = PyTorch(
    entry_point="autoencoder.py",
    role=role,
    py_version="py38",
    framework_version="1.11.0",
    instance_count=1,
    instance_type="ml.c5.2xlarge",
    hyperparameters={
    
    "epochs": 4},
)
estimator.fit({
    
    "training": inputs})

The entry_point is the previous training script. Then wait for model training. After the training is completed, you can get the model file model.pth, which only contains the decoder part. Here is the output:

Insert image description here

After training is completed, use the following sentence to deploy the model:

predictor = estimator.deploy(initial_instance_count=1, instance_type="ml.m4.xlarge")

However, the autoencoder itself only restores the original data. To generate faces, you need to use the Decoder part for reasoning.

4. Use autoencoders to achieve face gradients

4.1 Face gradient principle

As mentioned earlier, one of the advantages of the autoencoder is that it can change the input operation to the encoding operation. Now we have trained an autoencoder of a face. Assume that face A is encoded as z1 and face B is encoded as z2. Now we want the face to gradually change from A to B. Now this problem can be converted into a gradient between vectors z1 and z2. The gradient of the vector can directly use the interpolation algorithm. We insert n vectors into the two vectors, and then input these vectors into the decoder. The resulting face image is A face between A and B. Now the face gradient becomes interpolation.

4.2 Implementation of face gradient

First implement the interpolation algorithm. The implementation of interpolation is very simple. The specific code is as follows:

def interpolate(x1, x2, num):
    result = torch.zeros((num, *x1.shape))
    step = (x2 - x1) / (num - 1)
    for i in range(num):
        result[i] = x1 + step * i
    return result

The above function inputs two vectors of the same length and outputs num vectors. These num vectors will be used as input to the Decoder. Next use the Decoder part for reasoning:

# 加载数据集
dataloader = DataLoader(
    FaceDataset(data_dir='/home/zack/Files/datasets/img_align_celeba', image_size=64),
    batch_size=2
)
model = FaceAutoEncoder()
model.load_state_dict(torch.load('../outputs/face_auto_encoder.pth'))

model.eval()
with torch.no_grad():
    for idx, data in enumerate(dataloader):
        # 对人脸编码
        encoded1 = model.encoder(data[0].reshape((1, 3, 64, 64)))
        encoded2 = model.encoder(data[1].reshape((1, 3, 64, 64)))
        # 对人脸编码进行插值
        encoded = interpolate(encoded1[0], encoded2[0], 64)
        # 解码成人脸
        outputs = model.decoder(encoded).reshape((64, 3, 64, 64))
        outputs = make_grid(outputs, normalize=True)
        plt.imshow(outputs.numpy().transpose((1, 2, 0)))
        plt.show()

The following is the effect achieved:

Insert image description here
You can see that the face changes very naturally. Now you can deploy this part. The deployment code is as follows:

from sagemaker.pytorch import PyTorchModel
pytorch = PyTorchModel(
    model_data=model_data,
    role=role,
    entry_point="inference.py",
    source_dir="code",
    framework_version="1.3.1",
    py_version="py3",
    sagemaker_session=sagemaker_session,
)
predictor = pytorch.deploy(
    initial_instance_count=1,
    instance_type='ml.m5.large',
    endpoint_name=endpoint_name,
    wait=True,
)

Among them, inference.py is the inference script. The specific code is as follows:

def predict_fn(input_data, model):
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    input_data = input_data.to(device)
    with torch.no_grad():
        inputs = interpolate(input_data[:, 0], input_data[:, 1])
        return model.decoder.forward(inputs)


def model_fn(model_dir="model.pth"):
    loaded_model = torch.jit.load(model_dir, map_location=torch.device("cpu"))
    loaded_model = loaded_model.eval()
    return loaded_model

Then use predictor.predictthe passed in 2 × 3 × 64 × 64 tensor to complete the inference of the model and return the interpolated face image.

5. Autoencoder generates faces

In addition to realizing face gradients, autoencoders can also be used to generate faces. The key points are still in the encoding part.

5.1 Face distribution

The principle of the autoencoder to generate faces is relatively simple. When training the autoencoder, the face is encoded into a vector with a length of 1024 dimensions. Now we assume that the face follows a Gaussian distribution. If we can find the mean and variance, we can know what this Gaussian distribution looks like. After knowing the specific expression of the Gaussian distribution, you can sample the face vector from it, and give this vector to the decoder to generate the face.

The mean and variance can be obtained statistically. The specific code is as follows. Save the result as an npz file:

mean = np.zeros((zdim,), dtype=np.float32)
cov = np.zeros((zdim, zdim), dtype=np.float32)

device = "cuda" if torch.cuda.is_available() else "cpu"
model.eval()
model = model.to(device)
with torch.no_grad():
    for idx, data in enumerate(dataloader):
        try:
            data = data.to(device)
            encoded = model.encoder(data).view(128, -1)
            mean += encoded.mean(axis=0).cpu().numpy()
            cov += np.cov(encoded.cpu().numpy().T)
            if idx % 50 == 0:
                print(f"\ridx: {
      
      idx}/{
      
      len(dataloader)}", end="")
        except Exception as e:
            pass
    mean /= (idx + 1)
    cov /= (idx + 1)
np.savez('face_distribution.npz', mean, cov)

5.2 Generate faces

The operation of generating a face is to sample from the previous Gaussian distribution, and then hand the sampled vector to the decoder for encoding. Specific code:

# 加载人脸分布
distribution = np.load('face_distribution.npz')
mean = distribution['arr_0']
cov = distribution['arr_1']
# 生成编码向量
batch_size = 64
z = np.random.multivariate_normal(
    mean,
    cov,
    batch_size
).astype(np.float32)
# 解码
with torch.no_grad():
    encoded = torch.from_numpy(z).view(batch_size, 1024, 1, 1)
    outputs = model.decoder(encoded)
    outputs = torch.clamp(outputs, 0, 255)
    grid = make_grid(outputs).numpy().transpose((1, 2, 0))
    plt.imshow(grid)
    plt.show()

Below are some generated faces. The facial features can be seen clearly, but the background is somewhat blurry.

Insert image description here
Now the code deployed by Amazon SageMaker can be reused. You only need to modify the inference code. Modify the predict_fn function in inference.py to the following:

def predict_fn(input_data, model):
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    z = torch.randn(1, 1024, 1, 1).to(device)
    with torch.no_grad():
        return model.decoder.forward(z)

5.3 Generating related models

Previously, we used Amazon SageMaker to implement experiments on face gradients and face generation around autoencoders. Autoencoders are not a mainstream generation model. For image generation, the Stable Diffusion model is now more popular. In comparison, Stable Diffusion has stronger capabilities and the generated images are more realistic. Amazon officially provides experiments related to Stable Diffusion: https://catalog.us-east-1.prod.workshops.aws/workshops/3b86fa89-da3a-4e5f-8e77-b45fb11adf4a/zh-CN . We can use Amazon SageMaker to run it directly. For detailed practice, please refer to the experiment manual. Here are some sample pictures of the run.

For example, text input: cat holding a long sword, you can get the following picture

https://img-blog.csdnimg.cn/c305e7dbaf6e4c2f80e4d1588277e603.png#pic_center

Or: Van Gogh Starry Sky, you can get the following picture

https://img-blog.csdnimg.cn/5d897bbb097547e8971eb02fa87812c9.png#pic_center

The pictures generated each time are different, so you can try more interesting things.

6. Summary

6.1 Algorithm summary

The structure of the autoencoder is very simple, and it is easier to train than GAN, but its function is particularly powerful. In the previous experiments, we turned the adjustment of the results into the processing of encoding. There is a very ideal idea in the autoencoder that each dimension controls one feature, but in fact it is not that simple. There are also other variations of the autoencoder, such as variational autoencoders, ALAE, etc., which can achieve more powerful functions and can even be comparable to StyleGAN to generate realistic networks.

In addition to using autoencoders directly, structures similar to autoencoders can also be inserted into other networks. For example, there is the Unet network in Stable Diffusion, which is also an encoding and decoding structure.

6.2 Summary of Amazon SageMaker

There are many platforms for machine learning, and Amazon SageMaker is a relatively comprehensive and application-oriented one. Amazon SageMaker includes various processes of machine learning. The development habits in Python in the past can be fully applied in Amazon SageMaker. In addition, Amazon SageMaker provides higher-level APIs, allowing users to focus more on algorithm research.

Amazon SageMaker supports Sklearn, PyTorch, TensorFlow, Hugging Face, etc., and has corresponding encapsulation of these mainstream modules and frameworks. In addition, Amazon SageMaker provides a very convenient deployment method.

In order to facilitate model training, Amazon AutoPilot is provided in Amazon SageMaker, which can automatically search various models and sets of hyperparameters and train the optimal model.

For information about the activities of the Cloud Discovery Lab, please refer to https://dev.amazoncloud.cn/experience?trk=cndc-detail&sc_medium=corecontent&sc_campaign=product&sc_channel=csdn , which contains many interesting experimental cases. Through the Cloud Discovery Lab, developers can learn and practice cloud technologies while sharing their technical experience with other developers. Create and share together, help and inspire each other, and play with cloud technology. The Cloud Discovery Laboratory is not only a space for experience, but also a platform for sharing.”

Guess you like

Origin blog.csdn.net/ZackSock/article/details/129745320