A booster for your Amazon SageMaker machine learning journey

Authorization statement: This article authorizes the official Amazon Cloud Technology article to forward and rewrite the rights, including but not limited to Amazon Cloud Technology official channels such as Amazon Cloud Technology Developer Community, Zhihu, self-media platforms, and third-party developer media.

I. Introduction

        In today’s digital era, artificial intelligence and machine learning have become important engines for social progress. Amazon Cloud Technology announced the launch of five new Amazon SageMaker features at the 2023 re:Invent global conference:

  • Amazon SageMaker HyperPod reduces basic model training time by up to 40% by providing a dedicated infrastructure for large-scale distributed training;

  • Amazon SageMaker Inference reduces basic model deployment costs by an average of 50% by optimizing the use of accelerators, and shortens latency by an average of 20%;

  • Amazon SageMaker Clarify makes it easier for customers to quickly evaluate and select underlying models based on parameters that support responsible AI;

  • The Amazon SageMaker Canvas feature helps customers accelerate data preparation through natural language instructions and use base models for model customization in just a few clicks;

  • BMW, Booking.com, Hugging Face, Perplexity, Salesforce, Stability AI and Vanguard are already using the new Amazon SageMaker capabilities

Designed to help customers accelerate the construction, training and deployment of large language models and other basic models, these new featureswill help users Carry out model development and application deployment, providing more powerful tools and resources. This article will conduct actualexperience with Amazon SageMaker to reveal how it can help A journey into machine learning.

2. Overview of Amazon SageMaker

Creating machine learning models through traditional means requires developers to start with a data preparation process, go through visualization, choose an algorithm, set up a framework, train the model, adjust millions of possible parameters, deploy the model and monitor its performance. This process often requires Repeated multiple times, it is very tedious and extremely time-consuming.

The followingis a typical workflow for creating a machine learning model:

ThenAmazon SageMaker is a fully managed service that provides a one-stop machine learning development environment, from data preparation, From model training to model deployment, all of this can be completed in the cloud.It is very convenient and fast, and can bring huge performance improvements. The following are several machine learning development environments provided by Amazon SageMaker:

  • Amazon SageMaker Studio: Allows you to build, train, debug, deploy, and monitor your machine learning models.
  • Amazon SageMaker Notebook Instance: Allows you to prepare and process data, as well as train and deploy machine learning models from a compute instance running a Jupyter Notebook application.
  • Amazon SageMaker Studio Lab: Studio Lab is a free service that allows you to access AWS computing resources JupyterLab in an open source-based environment without an AWS account.
  • Amazon SageMaker Canvas: Enables you to use machine learning to generate predictions without writing code.
  • Amazon SageMaker Geospatial: Enables you to build, train, and deploy geospatial models.
  • Amazon rStud ioSageMaker: rStudio is an IDE for R that has controls that support direct code execution Desk, syntax highlighting editor, and tools for drawing, history, debugging, and workspace management.

Fordo not want to deal with hardware, software and infrastructure issues,Hope to simplify the machine learning model development process,Flexibly select algorithms, models and resources to meet different needs For business needs, you can choose Amazon SageMaker with confidence!

3. Application advantages of Amazon SageMaker in production environments

In the application process of machine learning, deploying the model to the production environment is a critical task. The production environment requires not only high performance of the model, but also high availability and scalability of the model. This article will delve into the advantages and challenges of using Amazon SageMaker in a production environment.

  1. High performance: Amazon SageMaker can use the computing resources of Amazon Cloud Technology to provide users with high-performance machine learning model training and deployment. It supports a variety of deep learning frameworks, including TensorFlow, PyTorch, etc., to meet different types of application needs.
  2. High availability: Amazon SageMaker ensures high availability of your models by automatically scaling clusters and data stores across multiple availability zones. This means the model remains stable even during traffic peaks or server failures.
  3. Automation: Amazon SageMaker provides automated model deployment tools that can automatically convert trained models into production-ready versions and deploy them to the cloud or edge devices. This significantly reduces model deployment complexity and human error rates.
  4. Security: Amazon SageMaker provides complete security controls, including data encryption, access control, and security auditing functions, which can protect the security of user data and models.

4. How Amazon SageMaker empowers every corporate role with machine learning capabilities

        I believe that for many developers in the computer field, using Amazon SageMaker to build machine learning should be familiar. Then people with non-computer field backgrounds can use the powerful functions of Amazon SageMaker to carry out machine learning and applications. In their daily business scenarios, the answer is yes. Amazon SageMaker Canvas enables you to use machine learning to generate predictions without writing any code. Next, I will use a publicly available diabetes patient data set (including historical data), which contains more than 15 features related to patient and hospital outcomes, with a total of 16,000 rows of data, using Amazon SageMaker CanvasZero code to build a model to predict whether a high-risk diabetic patient is likely to be hospitalized within 30 days, after 30 days, or not at all. Next, I will guide you on how to operate and use:

1. Select the canvas in the Amazon SageMaker console and click canvas

2. After entering the Amazon SageMaker Canvas interface, there will be a guidance prompt:Dataset management, modeling, prediction

3. Select New model and create a new model

4. Import the data set and preview. The data set contains15 characteristic fields related to patient and hospital results

5. The system provides two construction modes: standard mode and fast mode. Quick model construction mode, the model construction speed is faster, but the accuracy is lower. On the contrary, the standard mode takes more time to build the model and the accuracy is higher.

6. Select our target fieldreadmitted (readmission) field as our prediction field

We can view each feature value in the preview below, whether there are missing values ​​and its correlation with the target value, and filter feature values ​​or feature combinations as needed. By looking at the feature distribution, we can see if there are any bias or imbalance issues in the features. Amazon Canvas can automatically identify missing values ​​in your data and fill them in with adjacent values. By combining business logic and correlation with target values, we can initially select a combination of features.

7. At the same time, we can also quickly predict the effect of the model under the current configuration and view each by selecting Quick ModePreview model The influence of features to achieve dynamic interaction optimization

We can seenum-lab-precedures (number of laboratory procedures), num- Medication (number of medications) has a relatively large impact on the prediction results; fields such as patient gender have less relevance. We can remove fields with small impact in subsequent model training.

8. After selecting the feature combination, we can start building the model

SageMaker Canvas can automatically complete data cleaning, build up to 250 models, and select the optimal model. We can choose Quick build or Standard build to train the model: Quick build usually only takes 2-15 minutes; while Standard build takes 2-4 hours, but can provide higher accuracy and can be shared with SageMaker Studio with one click . The accuracy of the actually trained model is theoretically higher than what we predicted earlier.

9. Model construction results, you can see on the overview page that the prediction accuracy is 56 .716%, you can also see the influence value of each feature. On the score page, you can see the specific number of prediction accuracy and errors.

10. Use models to make predictions

After the model is built,you can use the model to predict individual data

From this, we can use this model prediction to clearly see which indicators are likely to affect high-risk diabetic patients within 30 days and 30 days. The impact of not being admitted to the hospital after a few days or not at all is relatively large, so as to provide positive feedback on what health matters the patient should pay attention to before, so as to avoid being admitted to the hospital again, which is for medical health There is great research help in the field.

11. Enlightenment

The above is the entire operation process used by Amazon SageMaker Canvas. During the use process, I was left with several impressive points:

1. Preview data

Import dataAfter construction, data analysts can quickly understand the general quality of the data, the data types of different characteristics, whether there are missing values, mean, mode and other information, Subsequent problems caused by data quality issues are greatly reduced.

2. Simple feature correlation analysis after construction

Usually, the selection of features is based on business experience. The system also provides a quick feature impact analysis in this regard, helping analysts to screen out unnecessary features and speed up model construction.

3. Ordinary users can also use it themselves

Generally speaking, it is very helpful for customers who need to use data analysis to explore the entire process of model creation, analysis and prediction by themselves. It is still very helpful to actually experience the role of machine learning in business analysis, and it has really been achieved. Let machine learning effectively empower every department of the enterprise, and put the ability of machine learning into the hands of every corporate role.

5. Conclusion

        Of course, when you use Amazon SageMaker, we can also use Data Wrangler to preprocess and clean user behavior data;Use Studio for model training and use the AutoML function Automated part of the model optimization process;Finally, the trained model was deployed to the production environment and used Amazon SageMaker's monitoring function monitors and manages models in real time.

        Overall, Amazon SageMaker is a powerful and comprehensive machine learning service. It provides users with a one-stop solution from data preparation to model deployment, greatly simplifying the machine learning process. Whether you are a beginner or an experienced developer, Amazon SageMaker can help you quickly and easily enter the world of machine learning.


 

Guess you like

Origin blog.csdn.net/m0_61243965/article/details/134991424