Play Llama2 fast! Alibaba Cloud Machine Learning PAI Launches Best Practices

foreword

Recently, Meta announced that the large language model Llama2 is open source, including different sizes of 7B, 13B, and 70B, corresponding to 7 billion, 13 billion, and 70 billion parameters, and each specification has an optimized model Llama- 2-Chat. Llama2 is free for research scenarios and commercial purposes (but companies with more than 700 million monthly active users need to apply), and for companies and developers, it provides the latest tool for large-scale model research.

At present, Llama-2-Chat surpasses other open source dialogue models on most evaluation indicators, and is not far behind some popular closed source models (ChatGPT, PaLM). Alibaba Cloud's machine learning platform PAI adapts the Llama2 series models in the first place, and introduces best practices in scenarios such as full fine-tuning, Lora fine-tuning, and inference services , helping AI developers quickly unpack. Below we will show the specific usage steps respectively.

Best practice 1: Llama 2 low-code Lora fine-tuning and deployment

  • This practice will be developed for Llama-2-7b-chat using the Alibaba Cloud machine learning platform PAI-quick start module. PAI-Quick Start supports the whole process of low-code training, deployment, and inference based on open source models, and is suitable for developers who want to experience pre-trained models quickly out of the box.

1. Preparation

1. Enter the PAI-quick start page

a. Log in to the PAI console  https://pai.console.aliyun.com/

b. Enter the PAI workspace, and find "Quick Start" in the left navigation bar.

2. Select the Llama2 model

PAI-Quickstart contains many popular open source models from different sources to support different domains and tasks of artificial intelligence. In this example, please select "Generative AI-Large Language Model (large-language-model)" to enter the model list page.

On the model list page, you can see several mainstream models from different open source communities. In this demonstration, we'll use llama-2-7b-chat-hfmodels ( llama-2-7b-hfthe same goes for models). You are also free to choose other models that suit your current business needs.

Tips:

  • Generally speaking, a model with a larger number of parameters will have a better effect, but the cost of running the corresponding model and the amount of data required for fine-tuning training will be greater.
  • Versions of Llama-2-13B and 70B, as well as other open source large language models will also be launched on PAI-Quick Start, so stay tuned.

2. Model online reasoning

The Llama-2-7b-chat model provided by Quick Start llama-2-7b-chat-hfis derived from HuggingFace . It is also a large language model mainly based on the Transformer architecture. It uses a variety of mixed open source data sets for training, so it is suitable for most English non-language Professional scene. We can quickly start deploying this model directly to PAI-EAS through PAI, creating an inference service.

1. Deployment model

Through the deployment entry on the model details page, you can create an online inference service based on this model with one click, and all parameters have been configured for you by default. Of course, you can also freely choose the computing resources and other settings used, and we can directly deploy the model to PAI-EAS to create an inference service.

Please note that the model requires at least 64GiB of memory and 24GiB or more of video memory. Please ensure that the computing resources you choose meet the above requirements, otherwise the deployment may fail.

On the service details page, you can view the deployment status of the inference service. When the service status is "Running", it means that the inference service has been successfully deployed.

Tips:

  • In the future, you can click the "Manage Tasks and Deployment" button in the PAI-Quick Start to return to the current inference service.

2. Call the inference service

After the deployment is successful, you can WebUIdebug your service as quickly as possible and send a prediction request.

The API call capability is also supported in the WebUI, and related documents can be viewed by clicking "Use via API" at the bottom of the WebUI page.

3. Model fine-tuning training

llama-2-7b-chat-hfModels are suitable for most non-professional scenarios. When you need to apply domain-specific expertise, you can choose to use fine-tuning training of the model to help the model's ability in the custom domain.

Tips:

  • The large language model can also directly learn relatively simple knowledge during the dialogue process, please choose whether to train according to your own needs.
  • The current training method supported by Quick Start is based on LoRA . Compared with other training methods (such as SFT, etc.), LoRA training will significantly reduce training costs and time, but the LoRA training effect of large language models may be unstable.

1. Prepare data

Tips

  • In order to facilitate your trial experience of the Llama 2 model, we  llama-2-7b-chat-hfhave also prepared a default data set for Instruction Tuning in the model card for you to directly perform fine-tuning training.

The model supports training using data on OSS. The training data accepts Json format input, and each piece of data consists of questions, answers, and ids, which are represented by "instruction", "output" and "id" fields, for example:

[
    {
        "instruction": "以下文本是否属于世界主题?为什么美国人很少举行阅兵?",
        "output": "是",
        "id": 0
    },
    {
        "instruction": "以下文本是否属于世界主题?重磅!事业单位车改时间表已出!",
        "output": "不是",
        "id": 1
    }
]

The specific format of the training data can also be found in the specific model introduction page of PAI-Quick Start.

For how to upload data to OSS and view corresponding data, please refer to the help documentation of OSS: https://help.aliyun.com/document_detail/31883.html?spm=a2c4g.31848.0.0.71102cb7dsCgz2

In order to better verify the effect of model training, in addition to providing the training data set, it is also recommended that you prepare a verification data set: it will be used to evaluate the effect of model training during training and optimize the adjustment of training parameters.

2. Submit training assignments

After preparing the dataset to be used, you can configure the dataset for training and submit the training job on the model page of the quick start. We have configured optimized hyperparameters and computing resource configurations used by training jobs by default, and you can also modify them according to your actual business.

On the training job details page, you can view the execution progress of the training task, task logs, and model evaluation information. When the status of the training task is "Success", the model produced by the training job will be saved to OSS (see "Model Output Path" on the job details page).

Tips:

  • Using the default data set and default hyperparameters, the estimated completion time of computing resource training is about 1 hour and 30 minutes. If using custom training data and configuration items, estimated training completion times may vary, but should generally be within a few hours.
  • If you close the page midway, you can click the "Manage Tasks and Deployment" button in the PAI-Quick Start to return to the current training task at any time. The API call capability is also supported in the WebUI, and related documents can be viewed by clicking "Use via API" at the bottom of the WebUI page.

3. Deploy the fine-tuning model

After the fine-tuning training is successful, the user can directly deploy the obtained model as an inference service on the job details page. For the specific model deployment and service invocation process, please refer to the above "Direct Deployment Model" document.

Best practice 2: Llama2 full parameter fine-tuning training

  • This practice will use the Alibaba Cloud machine learning platform PAI-DSW module to fine-tune the full parameters of Llama-2-7B-Chat. PAI-DSW is an interactive modeling platform. This practice is suitable for developers who need to customize fine-tuning models and pursue model tuning effects.

1. Operating environment requirements

The Python environment is 3.9 or higher, and A100 (80GB) is recommended for the GPU. This resource is relatively tight, and it is recommended to refresh it several times.

2. Preparation

1. Log in to PAI and download Llama-2-7B-Chat

a. Log in to the PAI console  https://pai.console.aliyun.com/

b. Enter PAI-DSW to create an instance and download the model file.

ModelScope download model, please click the link: https://modelscope.cn/models/modelscope/Llama-2-7b-chat-ms/summary

2. Download and installation environment

Then download and install the required environment.

  • ColossalAI is a massively parallel AI training system, and we use this framework for model fine-tuning in this example.
  • Transformers is a pre-trained language library based on the Transformers model structure.
  • Gradio is an open source library for quickly building machine learning web presentation pages.
! wget https://atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com/release/tutorials/llama2/ColossalAI.tar.gz
! tar -zxvf ColossalAI.tar.gz
! pip install ColossalAI/.
! pip install ColossalAI/applications/Chat/.
! pip install transformers==4.30.0
! pip install gradio==3.11

3. Download sample training data

Download the data required for training. Here we provide a piece of creative generation data, including speech generation and other content.

You can also refer to this format and prepare the required data yourself.

! wget https://atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com/release/tutorials/llama2/llama_data.json
! wget https://atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com/release/tutorials/llama2/llama_test.json

3. Fine-tuning the model

You can use the already written training scripts for model training.

! sh ColossalAI/applications/Chat/examples/train_sft.sh

4. Trial model

After the model training is complete, download the webUI demo we provide, and try the fine-tuned model (note that the model address is replaced with the address of the model you trained).

import gradio as gr
import requests
import json
from transformers import AutoTokenizer, AutoModelForCausalLM
#模型地址替换为自己训练好的模型地址
tokenizer = AutoTokenizer.from_pretrained("/mnt/workspace/sft_llama2-7b",trust_remote_code=True)
#模型地址替换为自己训练好的模型地址
model = AutoModelForCausalLM.from_pretrained("/mnt/workspace/sft_llama2-7b",trust_remote_code=True).eval().half().cuda()
def inference(text):
    from transformers import pipeline
    pipe = pipeline("text-generation", model=model, tokenizer=tokenizer,device='cuda:0', max_new_tokens=400)
    res=pipe(text)
    return res[0]['generated_text'][len(text):]
    
demo = gr.Blocks()
with demo:
    input_prompt = gr.Textbox(label="请输入需求", value="请以软件工程师的身份,写一篇入职的发言稿。", lines=6)
    generated_txt = gr.Textbox(lines=6)
    b1 = gr.Button("发送")
    b1.click(inference, inputs=[input_prompt], outputs=generated_txt) 
demo.launch(enable_queue=True, share=True)

5. Upload the model to OSS and deploy it online

If you want to deploy the above model to PAI-EAS, you need to upload the trained model to OSS first.

The following parameters need to be filled in according to your own information

# encoding=utf-8
import oss2
import os
AK='yourAccessKeyId'
SK='yourAccessKeySecret'
endpoint = 'yourEndpoint'
dir='your model output dir'
auth = oss2.Auth(AK, SK)
bucket = oss2.Bucket(auth, endpoint, 'examplebucket')
for filename in os.listdir(dir):
    current_file_path = dir+filename
    file_path = '需要上传地址'
    bucket.put_object_from_file(file_path, current_file_path)

The next step is to deploy, please refer to [Best Practice 3: Llama2 Quickly Deploy WebUI]

Best Practice 3: Llama2 Rapid Deployment of WebUI

  • This practice will use the Alibaba Cloud machine learning platform PAI-EAS  module to deploy for Llama-2-13B-chat. PAI-EAS is an online model service platform that supports one-click deployment of models as online inference services or AI-Web applications. It has the characteristics of elastic scaling and is suitable for developers who need cost-effective model services.

1. Service Deployment

1. Enter the PAI-EAS model online service page.

    1. Log in to the PAI console  https://pai.console.aliyun.com/
    2. Click the workspace list in the left navigation bar , and click the name of the workspace to be operated on the workspace list page to enter the corresponding workspace.
    3. In the left navigation bar of the workspace page, select Model Deployment > Model Online Service (EAS) to enter the PAI EAS Model Online Service page.

2. On the PAI EAS model online service page, click Deployment Service.

3. On the deployment service page, configure the following key parameters.

parameter describe
service name Customize the service name. The example value used in this case is: chatllm_llama2_13b.
Deployment method Select the image to deploy the AI-Web application.
mirror selection Select chat-llm-webui in the PAI platform image list, and select 1.0 for the image version. Due to the rapid iteration of versions, the highest version of the image version can be selected when deploying.
run command Service running command: If the 13b model is used for deployment: python webui/webui_server.py --listen --port=8000 --model-path=meta-llama/Llama-2-13b-chat-hf --precision=fp16 If you use the 7b model for deployment: python webui/webui_server.py --listen --port=8000 --model-path=meta-llama/Llama-2-7b-chat-hf port number input: 8000
resource group type Select a public resource group.
resource allocation method Choose General Resource Configuration.
Resource configuration selection The GPU type must be selected, and ecs.gn6e-c12g1.3xlarge is recommended for instance specifications. The 13b model must run on gn6e and higher models. The 7b model can run on the A10/GU30 model.
additional system disk Choose 50GB

4. Click Deploy and wait for a while to complete the model deployment.

2. Start the WebUI for model inference

1. Click View Web Application under the Service Mode column of the target service.

2. On the WebUI page, perform model reasoning verification.

Enter the dialogue content in the input interface at the bottom of the dialogue box, for example, "Please provide a financial management learning plan", and click send to start the conversation.

What's More

  1. This article mainly demonstrates the practice of quickly fine-tuning and deploying Llama2 based on the Alibaba Cloud machine learning platform PAI, mainly for 7B and 13B sizes. In the future, we will show how to fine-tune and deploy the 70B size Llama-2-70B based on PAI, so stay tuned.
  2. In the above experiment, [Best Practice 3: Llama2 Rapid Deployment WebUI] supports free trial models. Please click [Read the original text] to go to the Alibaba Cloud User Center to get a free trial of "PAI-EAS" and then go to the PAI console to experience it.

References:

  1. Llama2: Inside the Model https://ai.meta.com/llama/#inside-the-model
  2. Llama 2 Community License Agreement https://ai.meta.com/resources/models-and-libraries/llama-downloads/
  3. HuggingFace Open LLM Leaderboard https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard
  4. Alibaba Cloud Machine Learning Platform PAI: https://www.aliyun.com/product/bigdata/learn
※ Special reminder that Llama2 is a restricted open source model developed by foreign companies, please be sure to read and abide by the license agreement of Llama2 carefully before using, especially its restrictive license terms (for example, companies with more than 700 million monthly active users need to apply for additional licenses ) and disclaimer clauses, etc.
In addition, I remind you to abide by the laws and regulations of the applicable country. If you use Llama2 to provide services to the public in China, please abide by the laws and regulations of the country, especially not to engage in or generate behaviors and content that endanger the rights and interests of the country, society, and others.

Click to try cloud products for free now to start the practical journey on the cloud!

Original link

This article is the original content of Alibaba Cloud and may not be reproduced without permission.

Guess you like

Origin blog.csdn.net/yunqiinsight/article/details/131941054
Recommended