Play Llama2 fast! Alibaba Cloud Machine Learning PAI launched the best practice (2) - full parameter fine-tuning training

This practice will use the Alibaba Cloud machine learning platform PAI-DSW module to fine-tune the full parameters of Llama-2-7B-Chat. PAI-DSW is an interactive modeling platform. This practice is suitable for developers who need to customize fine-tuning models and pursue model tuning effects.

foreword

Recently, Meta announced that the large language model Llama2 is open source, including different sizes of 7B, 13B, and 70B, corresponding to 7 billion, 13 billion, and 70 billion parameters, and each specification has an optimized model Llama- 2-Chat. Llama2 is free for research scenarios and commercial purposes (but companies with more than 700 million monthly active users need to apply), and for companies and developers, it provides the latest tool for large-scale model research.

At present, Llama-2-Chat surpasses other open source dialogue models on most evaluation indicators, and is not far behind some popular closed source models (ChatGPT, PaLM). Alibaba Cloud's machine learning platform PAI adapts the Llama2 series models in the first place, and introduces best practices in scenarios such as full fine-tuning, Lora fine-tuning, and inference services , helping AI developers quickly unpack. Below we will show the specific usage steps respectively.

[Past Best Practices]: Play Llama2 quickly! PAI launches best practices (1) - low-code Lora fine-tuning and deployment

Best practice 2: Llama2 full parameter fine-tuning training

1. Operating environment requirements

The Python environment is 3.9 or higher, and A100 (80GB) is recommended for the GPU. This resource is relatively tight, and it is recommended to refresh it several times.

2. Preparation

1. Log in to PAI and download Llama-2-7B-Chat

a. Log in to the PAI console https://pai.console.aliyun.com/

b. Enter PAI-DSW to create an instance and download the model file. Run the following code to automatically select the appropriate download address for you and download the model to the current directory.

import os
dsw_region = os.environ.get("dsw_region")
url_link = {
    "cn-shanghai": "https://atp-modelzoo-sh.oss-cn-shanghai-internal.aliyuncs.com/release/tutorials/llama2/llama2-7b.tar.gz",
    "cn-hangzhou": "https://atp-modelzoo.oss-cn-hangzhou-internal.aliyuncs.com/release/tutorials/llama2/llama2-7b.tar.gz",
    "cn-shenzhen": "https://atp-modelzoo-sz.oss-cn-shenzhen-internal.aliyuncs.com/release/tutorials/llama2/llama2-7b.tar.gz",
    "cn-beijing": "https://atp-modelzoo-bj.oss-cn-beijing-internal.aliyuncs.com/release/tutorials/llama2/llama2-7b.tar.gz", 
}

path = url_link[dsw_region]
os.environ['LINK_CHAT'] = path
!wget $LINK_CHAT
!tar -zxvf llama2-7b.tar.gz

If your region is not in the above regions, you can choose the link closest to your region to download (different regions do not share the intranet, remember to remove -internal from the link). The download speed of data in the same region is fast, and it can also be downloaded between different regions, but the speed is slightly slower than that of the same region.

If you want to download the model from ModelScope, please click the link: https://modelscope.cn/models/modelscope/Llama-2-7b-chat-ms/summary

2. Download and installation environment

Then download and install the required environment.

  • ColossalAI is a massively parallel AI training system, and we use this framework for model fine-tuning in this example.
  • Transformers is a pre-trained language library based on the Transformers model structure.
  • Gradio is an open source library for quickly building machine learning web presentation pages.
! wget https://atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com/release/tutorials/llama2/ColossalAI.tar.gz
! tar -zxvf ColossalAI.tar.gz
! pip install ColossalAI/.
! pip install ColossalAI/applications/Chat/.
! pip install transformers==4.30.0
! pip install gradio==3.11

3. Download sample training data

Download the data required for training. Here we provide a piece of creative generation data, including speech generation and other content.

You can also refer to this format and prepare the required data yourself.

! wget https://atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com/release/tutorials/llama2/llama_data.json
! wget https://atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com/release/tutorials/llama2/llama_test.json

3. Fine-tuning the model

You can use the already written training scripts for model training.

! sh ColossalAI/applications/Chat/examples/train_sft.sh

4. Trial model

After the model training is complete, download the webUI demo we provide, and try the fine-tuned model (note that the model address is replaced with the address of the model you trained).

import gradio as gr
import requests
import json
from transformers import AutoTokenizer, AutoModelForCausalLM

#模型地址替换为自己训练好的模型地址
tokenizer = AutoTokenizer.from_pretrained("/mnt/workspace/sft_llama2-7b",trust_remote_code=True)
#模型地址替换为自己训练好的模型地址
model = AutoModelForCausalLM.from_pretrained("/mnt/workspace/sft_llama2-7b",trust_remote_code=True).eval().half().cuda()

def inference(text):
    from transformers import pipeline

    pipe = pipeline("text-generation", model=model, tokenizer=tokenizer,device='cuda:0', max_new_tokens=400)
    res=pipe(text)
    return res[0]['generated_text'][len(text):]
    

demo = gr.Blocks()
with demo:
    input_prompt = gr.Textbox(label="请输入需求", value="请以软件工程师的身份,写一篇入职的发言稿。", lines=6)
    generated_txt = gr.Textbox(lines=6)
    b1 = gr.Button("发送")
    b1.click(inference, inputs=[input_prompt], outputs=generated_txt) 

demo.launch(enable_queue=True, share=True)

5. Upload the model to OSS and deploy it online

If you want to deploy the above model to PAI-EAS, you need to upload the trained model to OSS first.

The following parameters need to be filled in according to your own information

# encoding=utf-8
import oss2
import os

AK='yourAccessKeyId'
SK='yourAccessKeySecret'
endpoint = 'yourEndpoint'
dir='your model output dir'
auth = oss2.Auth(AK, SK)
bucket = oss2.Bucket(auth, endpoint, 'examplebucket')
for filename in os.listdir(dir):
    current_file_path = dir+filename
    file_path = '需要上传地址'
    bucket.put_object_from_file(file_path, current_file_path)

What's More

This article mainly demonstrates the practice of quickly fine-tuning and deploying Llama2 based on the Alibaba Cloud machine learning platform PAI, mainly for 7B and 13B sizes. In the future, we will show how to fine-tune and deploy the 70B size Llama-2-70B based on PAI, so stay tuned.

[Get a free trial of machine learning PAI]

[Past Best Practices]: Play Llama2 quickly! PAI launches best practices (1) - low-code Lora fine-tuning and deployment

References:

  1. Llama2: Inside the Model https://ai.meta.com/llama/#inside-the-model
  2. Llama 2 Community License Agreement https://ai.meta.com/resources/models-and-libraries/llama-downloads/
  3. HuggingFace Open LLM Leaderboard https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard
  4. Alibaba Cloud Machine Learning Platform PAI: https://www.aliyun.com/product/bigdata/learn

I would like to remind you that Llama2 is a restricted open source model developed by foreign companies. Please read carefully and abide by the license agreement of Llama2 before using it, especially its restrictive license terms (for example, companies with more than 700 million monthly active users need to apply for additional licenses) and disclaimers, etc.

In addition, I remind you to abide by the laws and regulations of the applicable country. If you use Llama2 to provide services to the public in China, please abide by the laws and regulations of the country, especially not to engage in or generate behaviors and content that endanger the rights and interests of the country, society, and others.

Musk announced that Twitter will change its name to X and replace the Logo . React core developer Dan Abramov announced his resignation from Meta Clarification about MyBatis-Flex plagiarizing MyBatis-Plus OpenAI officially launched the Android version of ChatGPT ChatGPT for Android will be launched next week, now Started pre-registration Arc browser officially released 1.0, claiming to be a replacement for Chrome Musk "purchased for zero yuan", robbed @x Twitter account VS Code optimized name obfuscation compression, reduced built-in JS by 20%! Bun 0.7, a new high-speed JavaScript runtime , was officially released
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/5583868/blog/10091227