Build a fine-grained sentiment analysis application based on Amazon SageMaker

background introduction

Fine-grained sentiment analysis (Aspect-Based Sentiment Analysis, ABSA) attracts more and more practitioners due to its broad business value. Analyzing the emotional preferences in customer review data is often helpful for companies to explore customer concerns and mine Customer needs, accelerate product iteration, improve marketing efficiency, improve after-sales service, etc. It is no exaggeration to say that discovering the voice of customer (voice of customer) will seize the opportunity of enterprise development.

Voice of Customer (VOC) is a concept that various industries have paid attention to in recent years, and Amazon, as the world's most "customer-centric" company, launched the "Voice of Customer" (VOC) section as early as 2018. Help buyers and sellers achieve mutual benefit and win-win results. Amazon has always been committed to "customer-centric" technological innovation. For VOC scenarios, we have carried out extensive technical practice based on the actual business needs of customers in different industries, and finally built a platform based on natural language generation (NLG) technology combined with Amazon SageMaker. A set of efficient, customizable, and fine-grained generative review analysis solutions to extract the emotional polarity of different aspects (Aspect) in reviews and use the opinion words (Opinion) as evidence.

For example: "Today's salad is delicious but the steak is not fresh", there are two emotional polarities at the same time, for positive emotion (positive), its aspect word is "salad", and the corresponding opinion word is "very good" eat"; for negative emotion (negative), its aspect word is "steak", and the corresponding opinion word is "not fresh". In such a scenario, we use AI to extract higher-dimensional knowledge (aspect words, aspect word categories, opinion words, and emotional polarity) from user messages. times tens of thousands of times machine tags) to classify users, so that there are more optimized methods in advertising delivery, behavior induction, customer service and product upgrades.

Amazon SageMaker  is a fully managed machine learning platform service of Amazon Web Service. Based on this platform, algorithm engineers and data scientists can quickly build, train and deploy machine learning (ML) models without paying attention to the underlying resources. Management and operation and maintenance work. As a toolset, it provides all end-to-end components for machine learning, including data labeling, data processing, algorithm design, model training, training debugging, hyperparameter tuning, model deployment, model monitoring, etc., making machine learning It has become simpler and easier; at the same time, it relies on Amazon's powerful underlying resources to provide rich computing resources and sufficient computing power such as high-performance CPUs, GPUs, and elastic inference accelerator cards, making model development and deployment easier and efficient. At the same time, this article is also based on  Huggingface . Huggingface is a well-known open source community for NLP and is highly compatible with Amazon SagaMaker. It can easily implement NLP model training and deployment on Amazon SagaMaker with a few lines of code.

The Amazon cloud technology developer community provides developers with global development technology resources. There are technical documents, development cases, technical columns, training videos, activities and competitions, etc. Help Chinese developers connect with the world's most cutting-edge technologies, ideas, and projects, and recommend outstanding Chinese developers or technologies to the global cloud community. If you haven't paid attention/favorite yet, please don't rush over when you see this, click here to make it your technical treasure house!

Solution overview

In this example, we will use Amazon SageMaker to:

  • Environmental preparation
  • data preparation
  • Model Training with Amazon SageMaker BYOS
  • Managed deployment and inference testing

Environmental preparation

We first need to create an Amazon SageMaker Notebook. Since the local GPU training will not be used throughout this experiment, the notebook instance type can be selected arbitrarily.

After the notebook starts, open the terminal on the page and execute the following command to download the code.

 

 

cd ~/SageMaker
git clone https://github.com/HaoranLv/GAS-SageMaker.git

 

data preparation

The current ABSA has multiple sub-tasks, which are used to focus on different emotional elements. Here we give the current mainstream ABSA tasks:

Due to the large number of subtasks, this article uses TASD as an example to conduct experiments. The data is stored in a txt file in a fixed format. The specific format is:

 

All data used are from publicly available datasets. The data storage location is ./data/tasd/.

The specific introduction of the data set can refer to:

Model training with Amazon SageMaker BYOS

For the current mainstream deep learning frameworks (Tensorflow, Pytorch, Mxnet), Amazon SageMaker provides preset images. Specifically, for an open source code, if it is coded using a mainstream deep learning framework, we can theoretically call BYOS mode through Amazon SageMaker.

First, we preprocess the data and save it locally or in S3, then we only need to prepare the scripts required for training and its dependencies, and then we can instantiate the estimator for model training and deployment. In the actual process, EC2 training will be started Instance and load the preset image and start the container, and then the data will be transferred to the EC2 training instance in the form of pipeline for model training. After the training is completed, all logs will be stored in CloudWatch, and the trained model files will also be stored in A specific location on S3 for subsequent use. This article mainly demonstrates the experimental process of BYOS for the ABSA generation task under the Pytorch framework.

Open gabsa.ipynb in Jupyter Notebook and run it line by line.

Introduce dependencies and configure permissions

import sagemaker as sage
from time import gmtime, strftime
from sagemaker import get_execution_role
from sagemaker.pytorch import PyTorch
sess = sagemaker.Session()
role = sagemaker.get_execution_role()

print(f"sagemaker role arn: {role}")
print(f"sagemaker bucket: {sess.default_bucket()}")
print(f"sagemaker session region: {sess.boto_region_name}")

 

Upload processed data to S3

WORK_DIRECTORY = "./data"
# S3 prefix
prefix = "demo"
data_location = sess.upload_data(WORK_DIRECTORY, key_prefix=prefix)

Define hyperparameters, this experiment uses the T5-base pre-training parameters disclosed by Huggingface hub for initialization

hyperparameters = {
    "task" : "tasd", 
    "dataset" : "rest15", 
    "model_name_or_path" : "t5-base", 
    "paradigm": "extraction",
    "eval_batch_size" :"16",
    "train_batch_size" :"2",
    "learning_rate" :"3e-4",
    "num_train_epochs":"30",
    "n_gpu": "1"
}

Instantiate the estimator. Since the code uses the Pytorch framework, the Pytorch container preset by SageMaker is directly used here

entry_point = 'finetune.py'
source_dir = ‘./ ’
git_config = None
framework_version = '1.7.1'
py_version='py36'
instance_type='ml.p3.2xlarge'
instance_count=1
estimator = PyTorch(
    entry_point = entry_point,
    source_dir = source_dir,
    git_config = git_config,
    role = role,
    debugger_hook_config=False,
    hyperparameters = hyperparameters,
    framework_version = framework_version, 
    py_version = py_version,
    instance_type = instance_type,
    instance_count = instance_count
)

Start model training

inputs = {'tasd': data_location+'/tasd/'}
response = estimator.fit(inputs)

After the training starts, we can see the training task on the Amazon SageMaker console, click on the details to see the training log output, and monitor the usage of the GPU, CPU, memory, etc. of the machine to confirm that the program can work normally . You can also view the training log in CloudWatch after the training is complete.

Managed deployment and inference testing

After training, we can easily deploy the above model as a real-time port that can be called in the production environment.

import sagemaker
instance_type = 'ml.m5.4xlarge'
role = sagemaker.get_execution_role()
from sagemaker.pytorch.model import PyTorchModel

pytorch_model = PyTorchModel(model_data='s3://sagemaker-ap-southeast-1-116572824542/pytorch-training-2022-05-28-10-05-39-029/output/model.tar.gz', 
                             role=role,
                             entry_point='inference.py', 
                             source_dir='./', 
                             framework_version='1.7.1', 
                             py_version='py36'
                ) # TODO set model_server_workers=1 to avoid torchhub bug

predictor = pytorch_model.deploy(instance_type=instance_type, initial_instance_count=1

After the deployment is complete, you can see the following status on the console:

 

 

Then we can make an endpoint call

predictor.predict({"inputs": "Worth an hour of frustration to put together Although not the easiest product I’ve ever assembled, it was worth the few minutes of cursing. I have not decided where I will put the lamp but I’m glad I purchased it. Had to struggle with one that was lost but Amazon made that right by sending another. Others have complained about the shade but that was very simple to put together. It looks like a quality item especially when compared to floor lamps I’ve seen on the floor at stores like Home Goods or Lowes. I’m happy with it. "}, initial_args={'ContentType': 'application/json'})

The output is:

[( lamp, product, POSITIVE ); ( NULL, Services, POSITIVE )]

The above is the whole process of using Amazon SageMaker to build a fine-grained sentiment analysis application. You can see that Amazon SageMaker can be very conveniently combined with Huggingface to carry out the whole process of building, training, and deploying NLP models. The whole process only needs to prepare training scripts and data to start training and deployment through a few commands. At the same time, we will introduce ways to implement more NLP-related tasks using Amazon SageMaker, so stay tuned.

References

The author of this article

 

 

Lu Haoran

Amazon cloud technology application scientist, has long been engaged in the research and development of computer vision, natural language processing and other fields. Support data lab projects, and have rich experience in algorithm development and practical implementation in time series prediction, object detection, OCR, natural language generation, etc.

Article source: https://dev.amazoncloud.cn/column/article/6309e3bcafd24c6ba216ff9e?sc_medium=regulartraffic&sc_campaign=crossplatform&sc_channel=CSDN

Guess you like

Origin blog.csdn.net/u012365585/article/details/130319816