Practical experience with Amazon SageMaker machine learning

Insert image description here

(Statement: This article authorizes the official Amazon Cloud Technology article to forward and rewrite the rights, including but not limited to Amazon Cloud Technology official channels such as Amazon Cloud Technology Developer Community, Zhihu, self-media platforms, and third-party developer media)

Recently, Amazon Cloud Technology released five new features of Amazon SageMaker at the re:Invent 2023 conference, aiming to accelerate the construction, training, and deployment of large language models and other basic models. These new features will help users develop models and deploy applications faster, providing more powerful tools and resources. This article will introduce you to the working principle of Amazon SageMaker, and the practical use of the machine learning environment provided by Amazon SageMaker.

Amazon SageMaker Principles

In machine learning, we need to train computers to make predictions or inferences. First, the model needs to be trained using the algorithm and sample data. Then, integrate the model into your application to generate inferences in real time and at scale. In a production environment, models typically learn from millions of example data items and generate inferences in hundreds to less than 20 milliseconds.

The following diagram illustrates a typical workflow for creating a machine learning model:

Generate sample data

To train the model, example data is required. The type of data required depends on the business problem you want the model to solve. For example, suppose you want to create a model that predicts digits given an input image of handwritten digits. To train such a model, example images of handwritten digits are required.

retrieve data

Typically, you pull one or more datasets into a single repository.

Clean data

To improve model training, inspect your data and clean it if necessary.

Prepare or transform data

To improve performance, additional data transformations can be performed. For example, you can choose to combine attributes and so on.

Training model

To train a model, an algorithm or a pre-trained base model is required. The algorithm you choose depends on many factors. For a quick, out-of-the-box solution, you can also use one of the algorithms provided by SageMaker.

After training a model, it can be evaluated to determine whether the accuracy of the inferences is acceptable. You can use the SageMaker Python SDK to train and evaluate your model by sending inference requests to the model through one of the available IDEs.

Deployment model

Traditionally, models need to be redesigned before being integrated with the application and deployed. With SageMaker managed services, you can deploy your models independently, decoupling them from your application code.

Now that we understand the basic principles, let’s understand the power of this function in actual operation.

Practical machine learning

Below is the machine learning environment provided by SageMaker.

There are 11 environments among them. Below we choose one of them for practical use. Select the SageMaker Studio lab component.

Register an account

You can register for a free account, and registration is easy. You only need to fill in an email address.

After filling in, go to your email to verify your account.

Studio Lab Project

After successfully logging in, you can see the project description in the Studio Lab user interface. As shown below

All files and folders are included in the project, including Jupyter notebooks. Have full control over the files in your project. The project also includes a user interface based on JupyterLab. From this interface, you can interact with Jupyter notebooks, edit source code files, integrate with GitHub, and connect to Amazon S3.

Project preview

Opens a file browser and displays the Studio Lab project for the Studio Lab launcher. As shown below:

View environment

To view the environment in Studio Lab, you can use the Terminal or Jupyter Notebook. The following commands will work in the Studio Lab terminal.

Open the File Browser panel Open the Studio Lab Terminal, select the plus sign (+) on the top menu of the File Browser to open the launcher, and then select Terminal. On the Studio Lab terminal, list the conda environment by running the following command.

conda env list

This command outputs a list of conda environments and their locations in the file system. When you join Studio Lab, you automatically activate the studiolab conda environment. The following are examples of listed environments.

# conda environments: #
           default                  /home/studio-lab-user/.conda/envs/default
           studiolab             *  /home/studio-lab-user/.conda/envs/studiolab
           studiolab-safemode       /opt/amazon/sagemaker/safemode-home/.conda/envs/studiolab-safemode
           base                     /opt/conda

core code

Add a new piece of data with a status of Pending in the laboratory instance in the project. After a while, the status will automatically change to InService, and the instance will be changed to a usable status. The core code is as follows:

import sagemaker

sess = sagemaker.Session()
bucket = sess.default_bucket()

!aws s3 sync s3://sagemaker-sample-files/datasets/image/caltech-101/inference/ s3://{bucket}/ground-truth-demo/images/

print('Copy and paste the below link into a web browser to confirm the ten images were successfully uploaded to your bucket:')
print(f'https://s3.console.aws.amazon.com/s3/buckets/{bucket}/ground-truth-demo/images/')

print('\nWhen prompted by Sagemaker to enter the S3 location for input datasets, you can paste in the below S3 URL')

print(f's3://{bucket}/ground-truth-demo/images/')

print('\nWhen prompted by Sagemaker to Specify a new location, you can paste in the below S3 URL')

print(f's3://{bucket}/ground-truth-demo/labeled-data/')

Compute instance type

The Amazon SageMaker Studio Lab project runtime is based on EC2 instances. If availability of compute instances cannot be guaranteed and additional storage or compute resources are needed, consider switching to Studio.

CPU sum GPU

Amazon SageMaker Studio Lab offers CPU and GPU options.

CPU

The purpose of the CPU is to efficiently handle various tasks, but the number of tasks it can run simultaneously is limited. For machine learning, it is recommended to use CPUs to perform computationally intensive algorithms such as time series, forecasting, and tabular data.

The CPU calculation type can run for up to 4 hours at a time and up to 8 hours in a 24-hour period.

GPU

The purpose of the GPU is to render high-resolution images and videos simultaneously. GPUs are recommended for deep learning tasks, especially Transformers and computer vision.

GPU compute type is limited to 4 hours at a time and 4 hours in a 24-hour period.

SageMaker provides 11 machine learning environments, of which the SageMaker Studio lab component was selected for practical use. Readers can register for a free account, and after verification, they can log in and experience the Studio Lab project.

Summarize

This article introduces how Amazon SageMaker works and how to use it in a practical machine learning environment. The basic workflow of machine learning includes generating sample data, training the model, and deploying the model.

Peter DeSantis' speech demonstrated Amazon's determination to continue to innovate in the field of cloud computing. Serverless is not only a technological breakthrough, but also a new paradigm that meets the needs of enterprises. These innovative products and services will bring greater flexibility, efficiency and cost-effectiveness to developers and enterprises, opening up a new path for the future of cloud computing.

Guess you like

Origin blog.csdn.net/qq_36478920/article/details/135013626