Amazon SageMaker: A complete solution for building enterprise-level AI models

cbd1f459c4c6396d133320d303c00844.gif

The High Cost of Enterprise AI Applications

Artificial intelligence is still on top of a technological wave. With the development of smart chips, big data and cloud computing, deep learning technology has been further upgraded. AIGC technology led by ChatGPT shines: AI painting, AI composition, AI programming, AI writing... A series of AI products empower production; technologies such as edge computing, federated learning, and multi-agents gradually move from academia to industry, improving Production efficiency; traditional computer vision and natural language processing have further penetrated into people's daily life; concepts such as smart city, smart home, and smart transportation are constantly being proposed.

4c3af662374771e4efd31af2eb1e6389.png

In the foreseeable future, the application of artificial intelligence technology will be further expanded to more fields. However, building enterprise-level AI applications requires consideration of many aspects, including:

● Data preparation: collect business data and process it into formatted data, while protecting privacy and complying with regulations.

● Data cleaning: perform operations such as preprocessing, cleaning, and format conversion on the data to ensure the quality and accuracy of the data.

● Feature Engineering: Feature extraction and transformation of data to provide useful information for models.

● Model design: Select an appropriate algorithm based on business requirements and data analysis results, and perform model design and training.

● Model tuning: Tuning the model, including adjusting hyperparameters, selecting an appropriate loss function, optimizing algorithms, etc., to improve the performance of the model.

● Model deployment: deploy the trained model to the production environment to provide reliable, efficient, and secure services.

● Model monitoring: monitor the model to ensure the accuracy and stability of the model, and deal with problems in time.

● Continuous optimization: continuously optimize and improve the model to meet business needs and improve results.

● …

For application industries in non-artificial intelligence fields, it is often necessary to seek the support of professional teams or partners. As you can imagine, this process consumes manpower, material resources, and energy. Therefore, how to provide a convenient and complete enterprise-level artificial intelligence solution to facilitate the rapid processing of flexible commercial services in downstream industries has become a major demand.

Fortunately, Amazon provides such a platform - Amazon SageMaker, which can lower the threshold for building AI models in the application field and improve production efficiency.

What is Amazon SageMaker?

Amazon SageMaker is a managed machine learning service provided by Amazon Web Services. It enables data scientists and developers to quickly build, train and deploy machine learning models.

Amazon SageMaker provides a set of tools and features that enable users to complete the entire machine learning process, including data preparation, model training, model tuning, and deployment, in one integrated environment. In addition, Amazon SageMaker also provides a variety of pre-built algorithms and frameworks, including XGBoost, TensorFlow, and PyTorch.

1269b94fea56f49920be8629bfb095df.png

Amazon SageMaker is a comprehensive machine learning platform with a wide range of application scenarios.

● Enterprise-grade machine learning applications

Amazon SageMaker provides features such as automatic model tuning, model interpretation, and model deployment, making it easy for users to build and deploy machine learning models. Example: A financial institution can use Amazon SageMaker to build and deploy a fraud detection model to identify credit card fraud.

● Cloud native machine learning

Amazon SageMaker easily integrates with other Amazon cloud services. For example, users can use Amazon Lambda and Amazon API Gateway to create an API that makes predictions from Amazon SageMaker models accessible to other applications.

● High performance machine learning

Amazon SageMaker provides high-performance computing instances and GPU instances that can handle large-scale machine learning datasets and complex deep learning models. Example: A medical image diagnosis application can use a GPU instance in Amazon SageMaker to train and deploy a deep learning model to identify a patient's condition.

● Interpretation of machine learning models

Amazon SageMaker provides model explanation capabilities that help users understand the decision-making process of machine learning models. Example: An e-commerce company could use Amazon SageMaker to interpret the predictions of a recommender system model to better understand why a product was recommended to a certain user.

Next, two case explanations are given based on Amazon SageMaker.

Case 1: Quickly Build an Image Classification Application

convolutional neural network

Inspired by biological neural theory, the artificial neural network (neuron networks) model is produced , which is an extensive parallel interconnection network composed of adaptive simple units, which can simulate the intelligent interactive response of biological nervous system to external input. The basic unit of a neural network is called a neuron , and each neuron is connected to several other neurons to form a network. When a neuron's input exceeds a bias threshold, it is activated to produce an output, which sends the signal to other parts of the neural network.

a7336158e2a20f5e6da85bcd1a7b7cfc.png

A convolutional neural network is a neural network structure consisting of several convolutional layers. Mainly used for processing vision tasks. The basic principle is template matching and learning , that is, the template ( convolution kernel ) is designed according to the target image, and only the pixel area of ​​the original image that conforms to the characteristics of the template can obtain the maximum response. The convolution template is fitted through network learning to extract image feature information, avoiding explicit feature detection and calculation. Therefore, CNN has a strong generalization ability for image tasks.

At the same time, according to the translation invariance and locality assumptions, all pixels of the input feature map of the convolutional layer share the same template parameters regardless of pixel coordinates, so the CNN as a whole can realize parameter sharing, parallel learning, and speed up learning efficiency. For example, if a two-dimensional image is used as input, the number of neurons in the input layer of the fully connected network will be very large. Considering that each neuron is connected to all neurons in the adjacent layer, the connection weight matrix as the optimization target Exponential growth brings unacceptable computational complexity. CNN, on the other hand, realizes the aggregation and compression of high-dimensional information through convolution, filters out a large amount of redundant information doped, and greatly improves the learnability.

For the classic MNIST handwritten digit image classification experiment, the neural network is independently designed based on the Amazon SageMaker framework, and the network performance of local training and Amazon SageMaker training is compared. The process of the experiment is as follows:

▌Build a convolutional neural network;

▌Load the dataset . Download the MNIST handwritten digit dataset, divide it into training set, verification set and test set, and encapsulate it as an iterable data loader object;

▌Training the model . Define the loss function and optimization method, calculate the loss through forward propagation, and then optimize the model parameters based on back propagation, iterate until the training error converges and save the model locally;

local test version

As shown below, build a convolutional neural network:

class CNN(nn.Module):
    '''
    * @breif: 卷积神经网络
    '''    
    def __init__(self):
        super().__init__()
        self.convPoolLayer_1 = nn.Sequential(
            nn.Conv2d(in_channels=1, out_channels=10, kernel_size=5),
            nn.MaxPool2d(kernel_size=2),
            nn.ReLU()
        )
        self.convPoolLayer_2 = nn.Sequential(
            nn.Conv2d(in_channels=10, out_channels=20, kernel_size=5),
            nn.MaxPool2d(kernel_size=2),
            nn.ReLU()
        )
        self.fcLayer = nn.Linear(320, 10)


    def forward(self, x):
        batchSize = x.size(0)
        x = self.convPoolLayer_1(x)
        x = self.convPoolLayer_2(x)
        x = x.reshape(batchSize, -1)
        x = self.fcLayer(x)
        return x

Swipe left to see more

Use the Dataset class provided by pytorch to load and preview the MNIST dataset:

from abc import abstractmethod
import numpy as np
from torchvision.datasets import mnist
from torch.utils.data import Dataset
from PIL import Image


class mnistData(Dataset):
    '''
    * @breif: MNIST数据集抽象接口
    * @param[in]: dataPath -> 数据集存放路径
    * @param[in]: transforms -> 数据集变换
    '''    
    def __init__(self, dataPath: str, transforms=None) -> None:
        super().__init__()
        self.dataPath = dataPath
        self.transforms = transforms
        self.data, self.label = [], []


    def __len__(self) -> int:
        return len(self.label)


    def __getitem__(self, idx: int):
        img = self.data[idx]
        if self.transforms:
            img = self.transforms(img)
        return img, self.label[idx]


    def loadData(self, train: bool) -> list:
        '''
        * @breif: 下载与加载数据集
        * @param[in]: train -> 是否为训练集
        * @retval: 数据与标签列表
        '''    
        # 如果指定目录下不存在数据集则下载
        dataSet   = mnist.MNIST(self.dataPath, train=train, download=True)
        # 初始化数据与标签
        data  = [ i[0] for i in dataSet ]
        label = [ i[1] for i in dataSet ]
        return data, label

Swipe left to see more

7e27dc0c9a4fd26906734cfa4b46a8c4.png

Considering that this practice is a multi-classification problem, the output of the final network is a ten-dimensional vector and converted into a probability distribution through softmax, the loss function is designed as cross entropy, and the optimization method chooses the stochastic gradient descent algorithm:

for images, labels in trainBar:
    images, labels = images.to(config.device), labels.to(config.device)
    # 梯度清零
    opt.zero_grad()
    # 正向传播
    outputs = model(images)
    # 计算损失
    loss = F.cross_entropy(outputs, labels)
    # 反向传播
    loss.backward()
    # 模型更新
    opt.step()

Swipe left to see more

f8b6d863d1aa11ae35d6b5a9099dfdc8.png

After ten minutes of training, the model's predictions achieved an accuracy of 89%.

Amazon SageMaker version

First build a convolutional neural network:

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
        self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
        self.conv2_drop = nn.Dropout2d()
        self.fc1 = nn.Linear(320, 50)
        self.fc2 = nn.Linear(50, 10)


    def forward(self, x):
        x = F.relu(F.max_pool2d(self.conv1(x), 2))
        x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
        x = x.view(-1, 320)
        x = F.relu(self.fc1(x))
        x = F.dropout(x, training=self.training)
        x = self.fc2(x)
        return F.log_softmax(x, dim=1)

Swipe left to see more

Then load the data set. You can see that as long as you set the mirror and use upload_data, you can download the data set independently and load the data to the Amazon SageMaker node for subsequent training. There is no need to define additional data loading methods, and the call is more convenient and fast:

from torchvision.datasets import MNIST
from torchvision import transforms


MNIST.mirrors = ["https://sagemaker-sample-files.s3.amazonaws.com/datasets/image/MNIST/"]


MNIST(
    "data",
    download=True,
    transform=transforms.Compose(
        [transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))]
    ),
)


inputs = sagemaker_session.upload_data(path="data", bucket=bucket, key_prefix=prefix)

Swipe left to see more

f5ff35f75706d224877fd41e4c47330b.png

Then import the Pytorch object from Amazon SageMaker and create an instance:

from sagemaker.pytorch import PyTorch


estimator = PyTorch(
    entry_point="mnist.py",
    role=role,
    py_version="py38",
    framework_version="1.11.0",
    instance_count=2,
    instance_type="ml.c5.2xlarge",
    hyperparameters={"epochs": 1, "backend": "gloo"},
)

Swipe left to see more

One line of code to start training:

estimator.fit({"training": inputs})

The model completed training in about four minutes, and the test set accuracy reached 91%.

3d3d9c53f21d89146260d390f810ea72.png

For small tasks such as handwritten digit recognition, the advantages of Amazon SageMaker are not obvious enough, but it has been realized that the construction of artificial intelligence learning models is fast and the training is efficient, and there is no need to implement some data reading and back propagation from the bottom layer. In terms of application, engineering efficiency can be greatly improved.

Case 2: Quickly build an AI painting application

Introduction to Diffusion Models

In this section, we quickly build an AI painting application based on Amazon SageMaker and Diffusion model.

First briefly introduce the diffusion model Diffusion model . This is a generative artificial intelligence model for generating high-quality, high-fidelity images. It is based on a physical phenomenon called diffusion process, which uses partial differential equations to describe the diffusion and evolution of pixel values ​​in time and space.

The so-called diffusion algorithm Diffusion refers to gradually adding noise to a picture until the whole picture becomes white noise. Record this process, and then reverse it for AI to learn. What does the AI ​​see? How does a picture full of noise become clear little by little until it becomes a painting? AI learns to paint by learning this process of gradually removing noise.

15f19933e8a2c95b36ebf6872505cd1c.png

What are the advantages of Diffusion compared to the previous popular GAN model? According to a paper by OpenAI, the image quality generated by Diffusion is significantly better than that of the GAN model; and unlike GAN, Diffusion does not need to be entangled in the saddle point problem-involving stability issues, only need to minimize a standard convex The cross-entropy loss is sufficient, which greatly simplifies the difficulty of data processing during model training.

To sum up, the current training technology allows Diffusion to directly cross the stage of adjusting the model in the GAN field, but can be used directly for downstream tasks . It is an example of a new mathematical paradigm applied in the image field. So in terms of application, Diffusion has been widely used in image generation, image repair, image super-resolution and other fields. By using text input as conditional information, it can generate high-quality images based on text descriptions, such as generating animation scenes, natural scenery, etc. based on text descriptions.

Model building and deployment

First, simply configure a Notebook in Amazon SageMaker. My configuration here is as follows:

eb799401d284ebb1ae1821395b2f7841.png

Then create an IAM role for calling other services including Amazon SageMaker and S3. For example, uploading models, deploying models, etc., the settings can be kept as default.

a311df71861aaa53da51389facee52f1.png

After building and training the model, Amazon SageMaker allows us to deploy the model to an endpoint for predictive inference results .

There are several options for deploying models using the Amazon SageMaker managed service, such as:

● Python Development Kit (Boto3)

● Amazon SageMaker Python SDK

● Amazon CLI

● Amazon SageMaker console interactive deployment

● …

Here we take the Python development kit (Boto3) as an example to build this AI painting application, which mainly includes the following steps:

● Install and check dependencies

● Configure the model in Notebook

import torch
import datetime
from diffusers import StableDiffusionPipeline
# Load stable diffusion
pipe = StableDiffusionPipeline.from_pretrained(SD_MODEL, torch_dtype=torch.float16)

Swipe left to see more

● Write the initialization Amazon SageMaker code to deploy the inference endpoint:

import sagemaker
import boto3


sagemaker_session_bucket=None


if sagemaker_session_bucket is None and sess is not None:
    sagemaker_session_bucket = sess.default_bucket()


...
sess = sagemaker.Session(default_bucket=sagemaker_session_bucket)

Swipe left to see more

● Build inference scripts:

import base64
import torch
from io import BytesIO
from diffusers import StableDiffusionPipeline




def model_fn(model_dir):
    # Load stable diffusion and move it to the GPU
    pipe = StableDiffusionPipeline.from_pretrained(model_dir, torch_dtype=torch.float16)
    pipe = pipe.to("cuda")


    return pipe




def predict_fn(data, pipe):
  ...

Swipe left to see more

● Package upload model:

from sagemaker.s3 import S3Uploader
sd_model_uri=S3Uploader.upload(local_path=f"{SD_MODEL}.tar.gz", desired_s3_uri=f"s3://{sess.default_bucket()}/stable-diffusion")

Swipe left to see more

● Deploy the model to Amazon SageMaker using HuggingFace:

predictor[SD_MODEL] = huggingface_model[SD_MODEL].deploy(
    initial_instance_count=1,
    instance_type="ml.g4dn.xlarge",
    endpoint_name=f"{SD_MODEL}-endpoint"
)

Swipe left to see more

So far, the construction and deployment of the model have been completed, and then we can generate custom images based on inference terminal nodes.

AI Drawing Test (Vincent Diagram)

Enter the following test code:

response = predictor[SD_MODEL].predict(data={
    "prompt": [
        "Eiffel tower landing on the Mars",
    ],
    "height" : 512,
    "width" : 512,
    "num_images_per_prompt":1
  }
)


#decode images
decoded_images = [decode_base64_image(image) for image in response["generated_images"]]


#visualize generation
for image in decoded_images:
    display(image)

Swipe left to see more

For example, if we now want to generate a picture of "The Eiffel Tower Landing on Mars", we can get:

1596344e56bc2ed8e16bb4be30b38d75.png

Here is the generated drawing for Astronaut Rides:

db14a1d0487b873c0a4fbba975db912a.png

Here is the generated drawing for Cartoon Monkey Playing Computer:

04e25f888444ee66dccdca20981b74f0.png

epilogue

Practical experience and prospect

After experiencing Amazon SageMaker overall, I found it to be a very powerful and easy-to-use machine learning platform. First of all, it provides a variety of different machine learning frameworks to choose from, which allows me to easily choose the framework that I am most familiar with or best suits my needs to build, train and deploy machine learning models without considering the environment.

As you can see in Example One, SageMaker provides many pre-built machine learning algorithms that cover a variety of different use cases and problem types. This allows the user to easily select and use an algorithm that suits my needs. As you can see in case two, SageMaker provides a variety of integrated deployment options, including managed endpoints, managed containers, Amazon Lambda functions, and more. This allows users to easily deploy models to any desired environment, whether on the cloud or on-premises. In addition, SageMaker's documentation resources are very rich and detailed, which allows users to quickly find help and support when they encounter any problems during use. Amazon SageMaker also features advanced security and privacy protection mechanisms, such as data encryption, authentication, and access control. These mechanisms can protect user data and models, ensuring the security and trustworthiness of machine learning applications.

In general, compared with existing machine learning platforms, the core of Amazon SageMaker is to quickly build, train, and deploy machine learning applications . It is very suitable for combining with various application fields to quickly provide a complete solution for building enterprise-level AI models.

1794c102d0478d8580f64aa3c7e6c8e2.png

During the experience, I also found that Amazon SageMaker has some shortcomings. First and foremost is the economic cost, and using Amazon SageMaker can be expensive, especially when dealing with large datasets or long training sessions. Therefore, Amazon SageMaker is not suitable for individual users with small needs, but more in line with the positioning of enterprise-level AI application construction.

Second, although Amazon SageMaker provides an easy-to-use control panel, API, and documentation, its learning curve can be steep because it involves several different technologies and tools, and it can still take a long time for inexperienced users Learning and trial and error, to accumulate certain technical knowledge.

In addition, Amazon SageMaker lacks a certain degree of freedom in customization, and many functions and services of Amazon SageMaker are closely related to the Amazon ecosystem. If the specific algorithm or open source framework that the user needs (such as neural network, adversarial learning, etc.) is not in the ecology provided by Amazon SageMaker, more time and effort may be required for custom development or integration.

In the future, Amazon SageMaker is expected to continue to make efforts to the automatic machine learning function (AutoML) to provide a more complete, smarter, and more efficient model design and deployment experience, and to slow down the user learning curve. At the same time, for fast-iterating products, Amazon SageMaker is also expected to provide more intelligent model management and monitoring functions, especially model version control-this is an inevitable requirement for team collaboration to better manage and optimize models , to improve the reliability and stability of machine learning applications.

Cloud Discovery Lab

66a1d6322acc2a1e35df46547e9b56d3.jpeg

Finally, I would like to share Amazon's latest cloud lab activities . Through the cloud exploration lab, developers can use technology experiments, product experiences, and case applications to communicate with other developers. Create and share together, help and inspire each other, play with cloud technology, and provide infinite possibilities for technology practice. The Exploration Lab on the Cloud is not only a space for experience, but also a platform for sharing! Everyone is welcome to join.

The author of this article

Mr.Winter

Master of Control Science and Engineering from Tongji University, focusing on robot motion planning, enriching various technology stacks in his spare time.

536b5f4841a46edf7c477760e5acef11.gif

eb4e254f76a3dba31f19c6f89c1a3a63.gif

I heard, click the 4 buttons below

You will not encounter bugs!

faadca7aa9c92c642cb74c85c9052684.gif

Guess you like

Origin blog.csdn.net/u012365585/article/details/130437378