Play Llama2 fast! Alibaba Cloud Machine Learning PAI Launches Best Practices (3) - Rapid Deployment of WebUI

This practice will use the Alibaba Cloud machine learning platform PAI-EAS module to deploy for Llama-2-13B-chat. PAI-EAS is an online model service platform that supports one-click deployment of models as online inference services or AI-Web applications. It has the characteristics of elastic scaling and is suitable for developers who need cost-effective model services.

foreword

Recently, Meta announced that the large language model Llama2 is open source, including different sizes of 7B, 13B, and 70B, corresponding to 7 billion, 13 billion, and 70 billion parameters, and each specification has an optimized model Llama- 2-Chat. Llama2 is free for research scenarios and commercial purposes (but companies with more than 700 million monthly active users need to apply), and for companies and developers, it provides the latest tool for large-scale model research.

At present, Llama-2-Chat surpasses other open source dialogue models on most evaluation indicators, and is not far behind some popular closed source models (ChatGPT, PaLM). Alibaba Cloud's machine learning platform PAI adapts the Llama2 series models in the first place, and introduces best practices in scenarios such as full fine-tuning, Lora fine-tuning, and inference services , helping AI developers quickly unpack. Below we will show the specific usage steps respectively.

[Past Best Practices]: Play Llama2 quickly! PAI launches best practices (1) - low-code Lora fine-tuning and deployment

Play Llama2 fast! PAI launches best practice (2) - full parameter fine-tuning training

Best Practice 3: Llama2 Rapid Deployment of WebUI

1. Service Deployment

1. Enter the PAI-EAS model online service page.

    1. Log in to the PAI console https://pai.console.aliyun.com/
    2. Click the workspace list in the left navigation bar , and click the name of the workspace to be operated on the workspace list page to enter the corresponding workspace.
    3. In the left navigation bar of the workspace page, select Model Deployment > Model Online Service (EAS) to enter the PAI EAS Model Online Service page.

2. On the PAI EAS model online service page, click Deployment Service.

3. On the deployment service page, configure the following key parameters.

parameter describe
service name Customize the service name. The example value used in this case is: chatllm_llama2_13b .
Deployment method Select the image to deploy the AI-Web application .
mirror selection Select chat-llm-webui in the PAI platform image list, and select 1.0 for the image version . Due to the rapid iteration of versions, the highest version of the image version can be selected when deploying.
run command Service running command: - If using 13b model for deployment: python webui/webui_server.py --listen --port=8000 --model-path=meta-llama/Llama-2-13b-chat-hf --precision= fp16 - If using 7b model for deployment: python webui/webui_server.py --listen --port=8000 --model-path=meta-llama/Llama-2-7b-chat-hf port number input: 8000
resource group type Select public resource group
resource allocation method Choose General Resource Configuration .
Resource configuration selection The GPU type must be selected , and the instance specification is recommended to use ecs.gn6e-c12g1.3xlarge . The 13b model must run on gn6e and higher specifications. The 7b model can run on the A10/GU30 model.
additional system disk Choose 50GB

4. Click Deploy and wait for a while to complete the model deployment.

2. Start the WebUI for model inference

1. Click View Web Application under the Service Mode column of the target service.

2. On the WebUI page, perform model reasoning verification.

Enter the dialogue content in the input interface at the bottom of the dialogue box, for example, "Please provide a financial management learning plan", and click send to start the conversation.

What's More

  1. This article mainly demonstrates the practice of quickly fine-tuning and deploying Llama2 based on the Alibaba Cloud machine learning platform PAI, mainly for 7B and 13B sizes. In the future, we will show how to fine-tune and deploy the 70B size Llama-2-70B based on PAI, so stay tuned.
  2. In the above experiment, [Best Practice 3: Llama2 Rapid Deployment WebUI] supports free trial models. Please click [Read the original text] to go to the Alibaba Cloud User Center to get a free trial of "PAI-EAS" and then go to the PAI console to experience it.

[Get a free trial of machine learning PAI]

[Past Best Practices]: Play Llama2 quickly! PAI launches best practices (1) - low-code Lora fine-tuning and deployment

Play Llama2 fast! PAI launches best practice (2) - full parameter fine-tuning training

References:

  1. Llama2: Inside the Model https://ai.meta.com/llama/#inside-the-model

  2. Llama 2 Community License Agreement https://ai.meta.com/resources/models-and-libraries/llama-downloads/

  3. HuggingFace Open LLM Leaderboard https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard

  4. Alibaba Cloud Machine Learning Platform PAI: https://www.aliyun.com/product/bigdata/learn

I would like to remind you that Llama2 is a restricted open source model developed by foreign companies. Please read carefully and abide by the license agreement of Llama2 before using it, especially its restrictive license terms (for example, companies with more than 700 million monthly active users need to apply for additional licenses) and disclaimers, etc.

In addition, I remind you to abide by the laws and regulations of the applicable country. If you use Llama2 to provide services to the public in China, please abide by the laws and regulations of the country, especially not to engage in or generate behaviors and content that endanger the rights and interests of the country, society, and others.

Musk announced that Twitter will change its name to X and replace the Logo . React core developer Dan Abramov announced his resignation from Meta Clarification about MyBatis-Flex plagiarizing MyBatis-Plus OpenAI officially launched the Android version of ChatGPT ChatGPT for Android will be launched next week, now Started pre-registration Arc browser officially released 1.0, claiming to be a replacement for Chrome Musk "purchased for zero yuan", robbed @x Twitter account VS Code optimized name obfuscation compression, reduced built-in JS by 20%! Bun 0.7, a new high-speed JavaScript runtime , was officially released
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/5583868/blog/10091426