Amazon SageMaker review


(Statement: This article authorizes the official Amazon Cloud Technology article to forward and rewrite the rights, including but not limited to
Amazon Cloud Technology official channels such as Amazon Cloud Technology Developer Community, Zhihu, self-media platforms, and third-party developer media)

1 Introduction

At the 2023 Amazon Cloud Technology re:Invent , five new features of Amazon SageMaker were released , aiming to accelerate the construction, training, and deployment of large language models and other basic models to help users conduct model development and application deployment faster, providing More powerful tools and resources. I have actually experienced the new features of this product. I will describe the experience and feelings in detail below.

Open the Amazon Cloud Technology website and search Amazon SageMakerto directly enter the main page of this function. As a novice using this function for the first time, the reference document is the official tutorial, as shown in Figure 1-1 below. I chose no code: ML, which generates machine learning predictions without writing code. The tutorial is very detailed, giving each step and taking screenshots. However, some screenshots and operations of some functions are not consistent with the actual interface. I will explain it in detail later. Next, I will start to experience this function by building a domain and give some of my experience. Feel.

Insert image description here

2 Function experience

2.1 Build domain

I forgot to take a screenshot when building the domain, but after entering the main interface, there is a very eye-catching "Configure Personal Domain" on the right side that can be selected, and the button is yellow, which can be seen visually at a glance, and it is relatively simple to get started. Click After pressing this button, the domain will be automatically built SageMaker(as shown in Figure 2-1). After waiting for about ten minutes, the construction is completed and you can proceed to the next steps such as importing data.

Figure 2-1

2.2 Upload data set

After the construction is completed, search SageMaker Canvasto enter the main page and click the button in Figure 3-1 "Launch SageMaker Canvas"to automatically build SageMaker Canvas.

Figure 3-1

As shown in Figure 3-2, it is under construction. It will take about 15 minutes to enter the main page (Figure 3-3).

Figure 3-2


According to the tutorial, the next step is to build, train and analyze the ML model after uploading the data . Here I choose to download the two data in the official tutorial product_descriptionsand shipping_logsthen search and enter the S3 console to enter SageMaker Canvasthe default bucket created, and upload the data just downloaded ( Figure 3-4).


Figure 3-5 and Figure 3-6 represent screenshots of uploading and successful uploading respectively, so that the data set can be accessed in subsequent operations. The response speed of uploading data is also very fast. Even if the amount of data is relatively large, it does not take long to upload successfully.

Insert image description here

2.3 Set up SageMaker Canvas

Next, there are some differences from the official operation manual. When reopening canvas to prepare for settings SageMaker Canvas, it is written in the operation manual: "On the interface, select (dataset) SageMaker Canvasin the left pane , and then select (+ Import)" But after opening the page, the left window The grid was not found , so I selected the page shown in Figure 4-1) and selected the page to import the data. However, there was no button on my page , only the button (Figure 4-2). After clicking the button, because I The downloaded data is in a format, so I chose to create a new one because there is a prompt below .Datasets+ ImportDatasetsData Wrangler(Datasets"Import""Create".csvTabularTabularCSV

Pic 4-1

Insert image description here
Then follow the operation manual, Data Sourceselect in Amazon S3(Figure 4-3), and then select the folder containing the uploaded data (Figure 4-4). After finding the data, you can upload it. The operation is simple and the prompts are clear. According to the operation manual, you can Complete this step easily.

Insert image description here
According to the operation steps, I should choose to import the two downloaded data sets together, but it prompted that the number of columns is different and cannot be imported (Figure 4-5). I don’t know if the function has been upgraded and there are some restrictions, so I went back to Data Wranglerpage, directly select the official data set for merging.
Insert image description here

According to the tutorial, first select canvas-sample-shipping-logs.csvthe data on the merge page and drag it to the console on the right. Click on the file. You can see that as shown in the figure, each column has been visualized, and the data in each column is counted. The distribution chart is drawn, and moving the mouse to the blue data bar in each column can also display its specific values ​​(Figure 4-6 to Figure 4-9). From a sensory perspective, it can be more intuitive to understand a large amount of data. I understand that there is no similar function in other products yet, and it feels very novel and valuable.
Insert image description here

Figure 4-7

Figure 4-8

Then select inner join ProctedIdas the merge column, but still failed. The error message is as shown in Figure 4-9 below.

Figure 4-9

So I chose to product_descriptions.csvimport the data set product_descriptions, then shipping_logs.csvimport the data set shipping_logs, and then merged the two data sets, but the error message was still reported (Figure 4-10).

Figure 4-10

Since the error message indicates that the merged data cannot be previewed, I directly ignored the error, clicked in the lower left corner Import data, and then saved it as ConsolidatedShippingData(since I have tried it twice before, I automatically added the (2) distinction here) (Figure 4 -11).

Figure 4-11

2.4 Build, train and analyze ML models

接下来就是构建、训练与分析 ML 模型了,同样,与操作手册不同的是,在左侧页面中没有找到"Models"窗格,但由于要新增一个模型,所以我选择左侧窗格中的"My models"后点击新建模型按钮(图5-1)。

Figure 5-1

After selecting the new model, the interface that pops up allows you to select the model type. Different problems require different models to solve and analyze. Here you can see that there are four types of problems: 预测分析, 图片分析, 文本分析and 微调基础模型. (Figure 5-2) This is the biggest highlight of this function in my opinion. The reasons will be explained in detail in the subsequent evaluation chapter. Select the first one here: Predictive Analysis and click Create.

Figure 5-2

The first step
is to set the input data (Select) , select the data set just merged (Figure 5-3) and then Select datasetenter the next step: Build.
Insert image description here

In the second step
of building the model (Build) , you can select the target column, that is, select the data we need to predict, here select ActualShippingDaysthe field, that is, predict the time it takes for the goods to arrive at the destination (Figure 5-4). Since SageMakerCanvas will automatically try to reason about the problem type (Figure 5-5), it will reason about the problem as a time series prediction type problem after detecting the time, but what the customer wants to know is the required time, which is a specific number. Therefore, after clicking, Configue modelwe can "Model type"select the type we need in . If we don’t know which type to choose specifically, "Model type"there are also corresponding prompts in , giving examples of the specific problems to be solved by this type, so that we can choose the model more accurately. Very convenient and friendly for newbies.

Insert image description here

At the same time, we can remove some irrelevant fields, and then we can choose to build the model. There are two options: Quick Build(快速构建)and Standard Build(标准构建)to meet different needs (Figure 5-6). If you want to make a rough prediction, you can choose quick construction, and you can build a model in 15 minutes. If you want to make accurate predictions, you can choose standard construction, which provides a more accurate model. This classification can meet different needs and is also a special feature of this function.
Insert image description here
Wait for about 7-8 minutes to get the results. There are three pages: Overview (Figure 5-7, Figure 5-8) , Score (Figure 5-9) , and Advanced Indicators (Figure 5-10, Figure 5-11 ) . On the preview page, SageMaker Canvasthe column impact, or estimated importance of each input column in the predicted target column, is shown as the field on the left and its percentage.

Figure 5-7

Figure 5-8

On the score page, you can see ActualshippingDaysa graph representing the best-fit regression line (Figure 5-9).

Figure 5-9

Different indicators are displayed on the advanced indicator page, including R2 , mean absolute error (MAE) , mean absolute percentage error (MAPE) , and root mean square error (RMSE) (Figure 5-10), and you can also see the error density chart (Figure 5-11).

Figure 5-10

Figure 5-11

2.5 Generate prediction model

Then click Predict to generate a prediction model. Different from the operation manual, you can select Automatic(Figure 6-1) and then select the previously merged data set (Figure 6-2) for automatic prediction, or choose to Manualimport the data set again, or you can Generate prediction results (Figure 6-3).

Figure 6-1

Figure 6-2

Figure 6-3

3 Comments and suggestions

In Amazon SageMakerthe process of predicting data from scratch, I feel that it is very simple. Even if it is my first contact, I can complete the prediction step by step by following the operation manual. At the same time, it is very friendly for users without machine learning background, because There are prompts under many operation buttons and options, so that users will not be confused when using it. At the same time, the operation interface is very clear and the page jumps quickly.

In terms of functionality , as mentioned above, when choosing the type of model to create, there is an option to fine-tune the basic model. I think this is the biggest highlight of this product, because large model training uses other data for prediction. If I, as a If the person in charge of an enterprise has good data for training, then fine-tuning the basic model can improve the accuracy of its predictions and make it more in line with our expectations.

Another highlight of the product is that when building a model, it will give a column: Correlation to target, which can be used to judge the impact of this field on the model. Based on the explanation it gives, if the value is a negative number, then this field Even if it has a negative impact on the model, it can also be understood as a field that is not helpful to the prediction model. Then we can use the value of this field to uncheck some fields to reduce prediction time and improve efficiency (Figure 7-1).

During use, I also discovered some minor flaws: for example, the width of the function bar is fixed, and sometimes the pictures on the page cannot be fully displayed and can only be fully displayed by zooming the page. Another problem is that if the product supports customization The background color would be better, so that the browser page is suitable for dark colors and the interface will become clearer.

Overall, it is a very efficient and novice-friendly product. You can make machine learning predictions through the visual interface even without writing code .

Figure 7-1

Guess you like

Origin blog.csdn.net/Alita233_/article/details/135001947