Deploy Llama 2 through Amazon SageMaker JumpStart to quickly build exclusive LLM applications

888ece20dfe6f91795d00ac312634c93.gif

The Llama 2 base model from Meta is now available in Amazon SageMaker JumpStart. We can quickly deploy the Llama 2 model by using Amazon SageMaker JumpStart, and combine it with the open source UI tool Gradio to create a dedicated LLM application.

Llama 2 stores

Llama 2 is an autoregressive language model using an optimized Transformer architecture, designed for commercial and research purposes in the English domain, and its context length is twice that of Llama 1 generation. Base models are currently available in three parameter sizes (7B, 13B and 70B).

6fa5a202736677926f8150fa6f686788.jpeg

(Source: https://ai.meta.com/llama/)

Using SageMaker JumpStart 

Simplify the deployment of large models

Amazon SageMaker, a one-stop development platform, is a machine learning (ML) center that provides pre-trained models, built-in algorithms, and pre-built solutions to help you quickly start using machine learning. Six versions of the Llama-2 model are provided in SageMaker JumpStart.

ac68e4afdcad81b59e04ce291af3f90f.jpeg

If you don't see the relevant model in your JumpStart, please confirm whether the region you are using supports Llama 2 (check the supported region on the JumpStart model page) and whether it is the latest version of Studio (you can update your SageMaker Studio version).

The figure below shows the corresponding IDs, default instance types, and the maximum number of tokens supported by each model in SageMaker for the six models of Llama 2. Through model_id, we can easily start the corresponding model in SageMaker Notebook.

5e94caeb56d268e90d14fdfe5e64cc7c.jpeg

Program overview

We will deploy the Llama-2-7b-chat model on SageMaker, and let Gradio build the front-end page to create a lightweight chat assistant.

1. Deploy the model

In SageMaker, you can use JumpStart or Notebook to deploy inference nodes, and we will show you both.

1.1 SageMaker JumpStart one-click deployment

In SageMaker Studio, you can search for the corresponding model, and click to enter the corresponding model page. Here we use the model of Llama-2-7b-chat.

ea2b81e09223d75d6b435603e4cb95b5.jpeg

Click Deploy to deploy the relevant model, and the deployment time is about 15 minutes to 20 minutes. In addition, you can modify the instance type of the corresponding deployment through Deployment Configuration.

1c0e615c7c26dd2b82a08b357616cc45.jpeg

After the deployment is complete, you can see the corresponding inference node information.

1.2  Deploy using SageMaker Notebook

If you use JumpStart deployment, you don't need to deploy through SageMaker Notebook, skip to 2 directly.

(1) Set the model ID, here we choose the chat model of 7b specification

d878c59a6bff315c1fdb957274edbc0d.jpeg

(2) Deploy the specified model (meta-textgeneration-llama-2-7b-f)

397ff8d8f40e1ee0ffe48064276b11d6.jpeg

You can complete the deployment in about 15-20 minutes. After completion, you can see the inference nodes that have been started under the "End Node" tab on the SageMaker page of the Amazon Website Service console.

413559ae61e4ce5c346fafbc197ad397.jpeg

2. Set the parameters of the model

b178337d15649dfe6e3f2782507b5129.jpeg

3. Start Gradio to interact with the deployed model

f772498ea33a29e16d18816933132b28.jpeg

After execution, Gradio provides local url and url hosted on Gradio for you to use.

a7c796e38612a20c4e85a3d3c426d4d5.jpeg

Note that you need to set custom_attributes=”accept_eula=true” to successfully call the inference endpoint. By doing so you confirm acceptance of Llama 2's User License Agreement and Usage Policy.

The complete code can refer to the link:

https://github.com/tsaol/llama2-on-aws.git。

4. Test

Opening the link provided by Gradio, we'll see a chat page where we can try to ask Llama 2 some questions.

0702252ee9b929dc0ca49aceb3e12912.jpeg

5. Clean up and delete the environment

7a576e2db8ba731724767bc3d897dc2b.jpeg

Summarize

This article describes how to use SageMaker JumpStart and Notebook to deploy the Llama 2 model, and combine Gradio to easily build generative AI applications. Based on the characteristics of hosting services, you don't need to worry about the construction and operation and maintenance of the underlying infrastructure, and at the same time have a good experience in open source projects. You can also further modify the existing solutions to create exclusive large-scale model applications.

References

https://aws.amazon.com/cn/about-aws/whats-new/2023/07/llama-2-foundation-models-meta-amazon-sagemaker-jumpstart/

https://dev.amazoncloud.cn/column/article/64bf831469c6a22f966a19f4

https://aws.amazon.com/cn/blogs/machine-learning/llama-2-foundation-models-from-meta-are-now-available-in-amazon-sagemaker-jumpstart/

https://arxiv.org/pdf/2307.09288.pdf

https://www.gradio.app/guides

https://ai.meta.com/llama/

The author of this article

8b984a7fcf0f26c761863edfc45790f8.jpeg

Cao Li

Amazon cloud technology solution architect, responsible for consulting and architecture design of enterprise informatization solutions. More than 10 years of research and development experience, worked in large state-owned enterprises and Internet unicorns, and led the design and implementation of the technical architecture and data architecture of tens of billions of platforms. Focus on digital intelligence integration and generative AI direction, empowering enterprises to innovate and grow.

4d31885605d645c6d60cb39eb6004024.gif

b2a5bccac3f93602333944b940126afe.gif

I heard, click the 4 buttons below

You will not encounter bugs!

46172c2160ef4c346ad677c6c5d39ea9.gif

Guess you like

Origin blog.csdn.net/u012365585/article/details/132288739