The Llama 2 base model from Meta is now available in Amazon SageMaker JumpStart. We can quickly deploy the Llama 2 model by using Amazon SageMaker JumpStart, and combine it with the open source UI tool Gradio to create a dedicated LLM application.
Llama 2 stores
Llama 2 is an autoregressive language model using an optimized Transformer architecture, designed for commercial and research purposes in the English domain, and its context length is twice that of Llama 1 generation. Base models are currently available in three parameter sizes (7B, 13B and 70B).
(Source: https://ai.meta.com/llama/)
Using SageMaker JumpStart
Simplify the deployment of large models
Amazon SageMaker, a one-stop development platform, is a machine learning (ML) center that provides pre-trained models, built-in algorithms, and pre-built solutions to help you quickly start using machine learning. Six versions of the Llama-2 model are provided in SageMaker JumpStart.
If you don't see the relevant model in your JumpStart, please confirm whether the region you are using supports Llama 2 (check the supported region on the JumpStart model page) and whether it is the latest version of Studio (you can update your SageMaker Studio version).
The figure below shows the corresponding IDs, default instance types, and the maximum number of tokens supported by each model in SageMaker for the six models of Llama 2. Through model_id, we can easily start the corresponding model in SageMaker Notebook.
Program overview
We will deploy the Llama-2-7b-chat model on SageMaker, and let Gradio build the front-end page to create a lightweight chat assistant.
1. Deploy the model
In SageMaker, you can use JumpStart or Notebook to deploy inference nodes, and we will show you both.
1.1 SageMaker JumpStart one-click deployment
In SageMaker Studio, you can search for the corresponding model, and click to enter the corresponding model page. Here we use the model of Llama-2-7b-chat.
Click Deploy to deploy the relevant model, and the deployment time is about 15 minutes to 20 minutes. In addition, you can modify the instance type of the corresponding deployment through Deployment Configuration.
After the deployment is complete, you can see the corresponding inference node information.
1.2 Deploy using SageMaker Notebook
If you use JumpStart deployment, you don't need to deploy through SageMaker Notebook, skip to 2 directly.
(1) Set the model ID, here we choose the chat model of 7b specification
(2) Deploy the specified model (meta-textgeneration-llama-2-7b-f)
You can complete the deployment in about 15-20 minutes. After completion, you can see the inference nodes that have been started under the "End Node" tab on the SageMaker page of the Amazon Website Service console.
2. Set the parameters of the model
3. Start Gradio to interact with the deployed model
After execution, Gradio provides local url and url hosted on Gradio for you to use.
Note that you need to set custom_attributes=”accept_eula=true” to successfully call the inference endpoint. By doing so you confirm acceptance of Llama 2's User License Agreement and Usage Policy.
The complete code can refer to the link:
https://github.com/tsaol/llama2-on-aws.git。
4. Test
Opening the link provided by Gradio, we'll see a chat page where we can try to ask Llama 2 some questions.
5. Clean up and delete the environment
Summarize
This article describes how to use SageMaker JumpStart and Notebook to deploy the Llama 2 model, and combine Gradio to easily build generative AI applications. Based on the characteristics of hosting services, you don't need to worry about the construction and operation and maintenance of the underlying infrastructure, and at the same time have a good experience in open source projects. You can also further modify the existing solutions to create exclusive large-scale model applications.
References
https://aws.amazon.com/cn/about-aws/whats-new/2023/07/llama-2-foundation-models-meta-amazon-sagemaker-jumpstart/
https://dev.amazoncloud.cn/column/article/64bf831469c6a22f966a19f4
https://aws.amazon.com/cn/blogs/machine-learning/llama-2-foundation-models-from-meta-are-now-available-in-amazon-sagemaker-jumpstart/
https://arxiv.org/pdf/2307.09288.pdf
https://www.gradio.app/guides
https://ai.meta.com/llama/
The author of this article
Cao Li
Amazon cloud technology solution architect, responsible for consulting and architecture design of enterprise informatization solutions. More than 10 years of research and development experience, worked in large state-owned enterprises and Internet unicorns, and led the design and implementation of the technical architecture and data architecture of tens of billions of platforms. Focus on digital intelligence integration and generative AI direction, empowering enterprises to innovate and grow.
I heard, click the 4 buttons below
You will not encounter bugs!