Llama 2 Cloud Deployment and API Call【AWS SageMaker】

Meta has just released the Llama 2 large model. If you're anything like us, you can't wait to get your hands on it and build with it.

insert image description here

Recommendation: Use NSDT Designer to quickly build programmable 3D scenes.

The first step to building with any kind of LLM is to host it somewhere and consume it via an API. Then your developers can easily integrate it into your application. This guide shows how to host a Llama 2 model on Amazon SageMaker and consume the model through an API using AWS Lambda and AWS API Gateway.

Before getting started, go to amazon aws to log in or sign up for an account. New accounts will automatically get free tier access, which does offer some Sagemaker credits, but keep an eye out for them as the bill can be ridiculously high depending on your server choice.

1. Why use Llama2?

Why use llama 2 when I can use the Open AI API?

3 reasons:

  • Security - Keep sensitive data away from third-party vendors
  • Reliability - ensuring your application has uptime
  • Consistency - you get the same result every time you ask a question

2. Manage the Llama2 model

Once in the AWS dashboard, search for AWS Sagemaker in the search bar and click on it to go to AWS Sagemaker
insert image description here

AWS Sagemaker is AWS' solution for deploying and hosting machine learning models.
insert image description here

2.1 Setting up a domain on AWS Sagemaker

Click Domains on the left sidebar
insert image description here

Click Create Domain

insert image description here

Make sure the Quick Settings box is checked
insert image description here

Fill out the form below with the domain of your choice and fill in the rest of the options as shown in the screenshot.

If you're not familiar with this, select Create new role in the Execution role category. Otherwise, select a role that you may have created before.
insert image description here

Click "Submit" on the form to create your domain
insert image description here

Once the domain is created you will see this screen
insert image description here

Make a note of the username you see here as you will need it to deploy our model in the next step

If you get errors when creating your domain, it could be due to user permissions or VPC configuration.

2.2 Start a Sagemaker Studio session

Once the domain is created, click on the Studio link in the left sidebar
insert image description here

Select the domain and user profile you created earlier and click "Open Studio"

insert image description here

This will take you to a Jupyter Lab studio session as follows:

2.3 Selection of the Llama-2–7b-chat model

We will be deploying chat-optimized and 7b versions of the llama 2 model.

There is a more powerful 70b model which is more stable and for demonstration purposes it costs too much so we will use the smaller model

Click on Models, Notebooks, Solutions in the left column under the SageMaker Jumpstart tab
insert image description here

Search for the Llama 2 model in the search bar. We are looking for 7b chat model. click model
insert image description here

If you don't see this model, you may need to close and restart your studio session

This will take you to the model page. You can change the deployment settings to best suit your use case, but we will continue to use the default Sagemaker settings and deploy the model as-is
insert image description here

The 70B version requires a powerful server, so if your account doesn't have access to it, your deployment may go wrong. In this case, submit a request to AWS Service Quotas.

Wait 5-10 minutes for the deployment to complete and the confirmation screen to appear
insert image description here

Make a note of the model's endpoint name, as you'll need it to consume the model through the API.

At this point, you have now completed the first part of the hosting model.

2. Use the Llama 2 model through the API

First enter AWS Lambda to create a Lambda function, which will be used to call the endpoint of the LLM model.

Search for Lambda service in AWS console search bar and click on Lambda service
insert image description here

Click Create Function:
insert image description here

Enter the correct function name (whatever), select Python 3.10 as the runtime and the x86_64 architecture. Then click create function
insert image description here

3.1 Specifying the endpoints of the model

Enter the endpoint name of the LLM model from the last step above as an environment variable

Click on the Configuration tab in the newly created model
insert image description here

Click on Environment Variables and click Edit
insert image description here

Click "Add environment variable" on the next screen:

insert image description here

Enter ENDPOINT_NAME as the key and the model's endpoint name as the value. Click Save
insert image description here

You can put whatever you want for the key, but it needs to match what we wrote in the code to call the function later

3.2 Write the code to call the Llama model

Go back to the "Code" tab and copy and paste the following code

import os
import io
import boto3
import json

# grab environment variables
ENDPOINT_NAME = os.environ['ENDPOINT_NAME']
runtime= boto3.client('runtime.sagemaker')

def lambda_handler(event, context):
    response = runtime.invoke_endpoint(EndpointName=ENDPOINT_NAME,
                                       ContentType='application/json',
                                       Body=event['body'],
                                       CustomAttributes="accept_eula=true")
    
    result = json.loads(response['Body'].read().decode())
    
    
    return {
        "statusCode": 200,
        "body": json.dumps(result)
    }

After the code is inserted successfully, click "Deploy" to deploy:
insert image description here

4. Connect Lambda function to AWS API Gateway

Go to the main screen of the Lambda function and click Add Trigger

insert image description here

Select the API Gateway menu item in the Add Trigger dialog
insert image description here

Fill out the API Gateway dialog box as shown below and click Add
insert image description here

After successfully creating the API endpoint, you can view the API URL under the Configuration tab and the Triggers sidebar
insert image description here

5. Test the Llama2 API

Make a POST or GET request to your API URL with the following JSON body

{
 "inputs": [
  [
   {"role": "system", "content": "You are chat bot who writes songs"},
   {"role": "user", "content": "Write a rap about Barbie"}
  ]
 ],
 "parameters": {"max_new_tokens":256, "top_p":0.9, "temperature":0.6}
}

The input key stores an array where each element is 1 conversation between you and the chatbot.

Conversations are represented by nested JSON objects containing roles representing "people" in the conversation and content as prompts

Check the API's response status code and response JSON. The status code should be 200 and the response JSON will look like this

[{"generation": {"role": "assistant", 
"content": " Yo, listen up, I got a story to tell 
'Bout a doll that's fly, and she's doin' well
Barbie, Barbie, she's the queen of the scene
From Malibu to Kenya, she's always lookin' supreme
She's got the style, she's got the grace
She's got the swag, she's got the pace
She's got the cars, she's got the cash
She's livin' large, she's got the flash
She's got the perfect body, she's got the perfect face
She's got the perfect hair, she's got the perfect pace
She's got the perfect life, she's got the perfect vibe
She's got it all, she's got the perfect ride
She's got Ken by her side, he's her main man
He's got the skills, he's got the plan
They're livin' large, they're got the fame
They're the perfect couple, they're got the game
So listen up,"}}]

The following Python code can be used to test the API. Replace the value of api_url with the API Url you created in the previous step 4

import requests

api_url = 'https://spip03jtgd.execute-api.us-east-1.amazonaws.com/default/call-bloom-llm'

json_body = {
 "inputs": [
  [
   {"role": "system", "content": "You are chat bot who writes songs"},
   {"role": "user", "content": "Write a rap about Barbie"}
  ]
 ],
 "parameters": {"max_new_tokens":256, "top_p":0.9, "temperature":0.6}
}

r = requests.post(api_url, json=json_body)

print(r.json())

6. Possible errors

In this case you may get some errors:

  • Permissions: If your role does not have permission to invoke endpoint policies using Sagemaker, then you will not be able to invoke endpoints.
  • Timeouts: Depending on your prompt and variables, you may get timeout errors. Unlike permissions, this is an easy fix. Click Configure, General, Edit Timeout, and set the timeout value to a higher number of seconds

insert image description here


Original link: Llama2 Cloud Deployment and Invocation—BimAnt

Guess you like

Origin blog.csdn.net/shebao3333/article/details/132138735