Fine-tune and deploy the ChatGLM model using Amazon SageMaker

63da301a0d8a94f5737fbb6318645ea1.gif

This article mainly introduces examples of how to use Amazon SageMaker to deploy and fine-tune the ChatGLM model.

This example mainly includes:

  1. ChatGLM General Introduction

  2. ChatGLM fine-tuning introduction

  3. ChatGLM environment settings

  4. ChatGLM fine-tuning training

  5. ChatGLM deployment testing

Preface

The big language model is an artificial intelligence model based on deep learning technology, which can be traced back to early language models and machine translation systems. Only recently, with the rise of deep learning technology, large-scale pre-trained language models have begun to attract widespread attention.

Large language models use large-scale text data sets for pre-training to learn rich language knowledge and context understanding capabilities. Through pre-training and fine-tuning, large language models can be used for various natural language processing tasks, such as text generation, machine translation, question and answer systems, dialogue systems, etc. They have demonstrated impressive performance in many fields and have become an important driving force in the development of artificial intelligence technology.

ChatGLM General Introduction

The ChatGLM model is a conversational language model open sourced by Tsinghua University that supports Chinese and English bilingual question and answer and is optimized for Chinese. The model is based on the General Language Model (GLM) architecture and has 6.2 billion parameters. Combined with model quantification technology, users can deploy it locally on consumer-grade graphics cards.

ChatGLM has the following features:

  • Sufficient bilingual pre-training in Chinese and English: ChatGLM has trained 1T tokens on Chinese and English materials in a 1:1 ratio, and has bilingual capabilities.

  • Optimized model architecture and size: Fixed 2D RoPE position encoding implementation. The parameter size of 6B (6.2 billion) also makes it possible for researchers and individual developers to fine-tune and deploy ChatGLM themselves.

  • Lower deployment threshold: At FP16 half-precision, ChatGLM requires at least 13 GB of video memory for inference. Combined with model quantization technology, this requirement can be further reduced to 10GB (INT8) and 6GB (INT4), allowing ChatGLM to be deployed at the consumer level on the graphics card.

  • Longer sequence length: ChatGLM has a sequence length of 2048, supporting longer conversations and applications.

ChatGLM fine-tuning introduction

Model fine-tuning is mainly divided into Full Fine-Tune and PEFT (Performance-Efficient Fine-Tune). The former will update all parameters of the model, requiring longer training time and larger training resources; while the latter will freeze most parameters and fine-tune the training network structure. , the common way is LoRA and P-Tuning v2. For ChatGLM, P-Tuning v2 is selected for model fine-tuning, and its network structure is as follows: Prompt/Prefix is ​​added to all layers of Transformers.

959dcb0a647654f6aef6c0266ba389a9.jpeg

ChatGLM environment settings

Note: The sample codes in the project are all stored in the code warehouse at the following address:

https://github.com/GlockGao/aws-sagemaker-llm

1. Upgrade Python SDK

pip install --upgrade boto3
pip install --upgrade sagemaker
pip install huggingface_hub

2. Obtain runtime resources, including regions, roles, accounts, S3 buckets, etc.

import boto3
import sagemaker
from sagemaker import get_execution_role


sess = sagemaker.Session()
role = get_execution_role()
sagemaker_default_bucket = sess.default_bucket()


account = sess.boto_session.client("sts").get_caller_identity()["Account"]
region = sess.boto_session.region_name

Swipe left to see more

ChatGLM fine-tuning training

Prepare for fine-tuning

Clone code

rm -rf ChatGLM-6B
git clone https://github.com/THUDM/ChatGLM-6B.git
cd ChatGLM-6B
git checkout 163f94e160f08751545e3722730f1832d73b92d1

Swipe left to see more

Download dataset

The advertising dataset for the example is used here. The output of the advertising slogan is implemented based on the input. The format is as follows:

{
 "content": "类型#上衣版型#宽松版型#显瘦图案#线条衣样式#衬衫衣袖型#泡泡袖衣款式#抽绳",
 "summary": "这件衬衫的款式非常的宽松,利落的线条可以很好的隐藏身材上的小缺点,穿在身上有着很好的显瘦效果。领口装饰了一个可爱的抽绳,漂亮的绳结展现出了十足的个性,配合时尚的泡泡袖型,尽显女性甜美可爱的气息。"
}
# 下载 ADGEN 数据集
wget -O AdvertiseGen.tar.gz https://cloud.tsinghua.edu.cn/f/b3f119a008264b1cabd1/?dl=1


# 解压数据集
tar -xzvf AdvertiseGen.tar.gz

Swipe left to see more

Download ChatGLM original model

from huggingface_hub import snapshot_download
from pathlib import Path




local_cache_path = Path("./model")
local_cache_path.mkdir(exist_ok=True)


model_name = "THUDM/chatglm-6b"


# Only download pytorch checkpoint files
allow_patterns = ["*.json", "*.pt", "*.bin", "*.model", "*.py"]


model_download_path = snapshot_download(
    repo_id=model_name,
    cache_dir=local_cache_path,
    allow_patterns=allow_patterns,
)


# Get the model files path
import os
from glob import glob


local_model_path = None


paths = os.walk(r'./model')
for root, dirs, files in paths:
    for file in files:
        if file == 'config.json':
            # print(os.path.join(root, file))
            local_model_path = str(os.path.join(root, file))[0:-11]
            print(local_model_path)
if local_model_path == None:
    print("Model download may failed, please check prior step!")

Swipe left to see more

Copy model and data to S3

chmod +x ./s5cmd
./s5cmd sync ${local_model_path} s3://${sagemaker_default_bucket}/llm/models/chatglm/original-6B/
./s5cmd sync ./AdvertiseGen/ s3://${sagemaker_default_bucket}/llm/datasets/chatglm/AdvertiseGen/


rm -rf model
rm -rf AdvertiseGen
rm -rf AdvertiseGen.tar.gz

Swipe left to see more

Model fine-tuning

The fine-tuning of the model uses P-Tuning v2 to achieve a balance between cost and effect.

There are many source codes for model fine-tuning changes. For details, please refer to the above git repository.

Model fine-tuning parameters

The key parameters of the model fine-tuning settings are as follows:

  1. Prefix word length: 128

  2. Learning rate: 2e-2, ensuring loss decreases during training

  3. batch size:1

  4. gradient accumulation step:16

  5. Training step size: 50. The step size is only set to 50 steps. You can already see the obvious fine-tuning results.

import time
from sagemaker.huggingface import HuggingFace




PRE_SEQ_LEN=128
LR=2e-2
BATCH_SIZE=1
GRADIENT_ACCUMULATION_STEPS=16
TRAIN_STEPS=50


job_name = f'huggingface-chatglm-finetune-ptuning-{time.strftime("%Y-%m-%d-%H-%M-%S", time.localtime())}'


instance_type  = "ml.g4dn.2xlarge"
instance_count = 1


# 基础模型存放地址
model_name_or_path = 's3://{}/llm/models/chatglm/original-6B/'.format(sagemaker_default_bucket)


# 微调模型输出地址
output_dir         = '/opt/ml/model/adgen-chatglm-6b-ft'
model_s3_path      = 's3://{}/llm/models/chatglm/finetune-ptuning-adgen/'.format(sagemaker_default_bucket)


# 模型环境变量设置
environment = {
    'PYTORCH_CUDA_ALLOC_CONF': 'max_split_size_mb:32',
    'TRAIN_DATASET'          : '/opt/ml/input/data/AdvertiseGen/train.json',
    'TEST_DATASET'           : '/opt/ml/input/data/AdvertiseGen/dev.json',
    'PROMPT_COLUMN'          : 'content',
    'RESPONSE_COLUMN'        : 'summary',
    'MODEL_NAME_OR_PATH'     : model_name_or_path,
    'OUTPUT_DIR'             : output_dir,
    'MODEL_OUTPUT_S3_PATH'   : model_s3_path,
    'TRAIN_STEPS'            : '50'
}


inputs = {
   'AdvertiseGen': f"s3://{sagemaker_default_bucket}/llm/datasets/chatglm/AdvertiseGen/"
}

Swipe left to see more

Turn on model fine-tuning

huggingface_estimator = HuggingFace(
    entry_point          = 'sm_ptune_train.py',
    source_dir           = './ChatGLM-6B/ptuning',
    instance_type        = instance_type,
    instance_count       = instance_count,
    base_job_name        = job_name,
    role                 = role,
    script_mode          = True,
    transformers_version = '4.26',
    pytorch_version      = '1.13',
    py_version           = 'py39',
    environment          = environment
)


huggingface_estimator.fit(inputs=inputs)

Swipe left to see more

ChatGLM deployment testing

Model deployment

1. Prepare the Dummy model

!touch dummy
!tar czvf model.tar.gz dummy
assets_dir = 's3://{0}/{1}/assets/'.format(sagemaker_default_bucket, 'chatglm')
model_data = 's3://{0}/{1}/assets/model.tar.gz'.format(sagemaker_default_bucket, 'chatglm')
!aws s3 cp model.tar.gz $assets_dir
!rm -f dummy model.tar.gz

Swipe left to see more

2. Configure model parameters

from sagemaker.pytorch.model import PyTorchModel


model_name                  = None
entry_point                 = 'chatglm-inference-finetune.py'
framework_version           = '1.13.1'
py_version                  = 'py39'
base_model_name_or_path     = 's3://{}/llm/models/chatglm/original-6B/'.format(sagemaker_default_bucket)
finetune_model_name_or_path = 's3://{}/llm/models/chatglm/finetune-ptuning-adgen/adgen-chatglm-6b-ft/checkpoint-50/pytorch_model.bin'.format(sagemaker_default_bucket)


# 模型环境变量设置
model_environment  = {
    'SAGEMAKER_MODEL_SERVER_TIMEOUT': '600',
    'SAGEMAKER_MODEL_SERVER_WORKERS': '1',
    'MODEL_NAME_OR_PATH'            : base_model_name_or_path,
    'PRE_SEQ_LEN'                   : '128',
    'FINETUNE_MODEL_NAME_OR_PATH'   : finetune_model_name_or_path,
}


model = PyTorchModel(
    name              = model_name,
    model_data        = model_data,
    entry_point       = entry_point,
    source_dir        = './code',
    role              = role,
    framework_version = framework_version, 
    py_version        = py_version,
    env               = model_environment
)

Swipe left to see more

3. Deploy the fine-tuned model

from sagemaker.serializers import JSONSerializer
from sagemaker.deserializers import JSONDeserializer


endpoint_name         = None
instance_type         = 'ml.g4dn.2xlarge'
instance_count        = 1


predictor = model.deploy(
    endpoint_name          = endpoint_name,
    instance_type          = instance_type, 
    initial_instance_count = instance_count,
    serializer             = JSONSerializer(),
    deserializer           = JSONDeserializer()
)

Swipe left to see more

4. The key model loading code is as follows: load the original ChatGLM model and load the PrefixEncoder parameters of FineTune to perform inference together.

import torch
import os


from transformers import AutoConfig, AutoModel, AutoTokenizer


# 载入Tokenizer
tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True)


# 如果需要加载的是新 Checkpoint(只包含 PrefixEncoder 参数):
config = AutoConfig.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True, pre_seq_len=128)
model = AutoModel.from_pretrained("THUDM/chatglm-6b", config=config, trust_remote_code=True)
prefix_state_dict = torch.load(os.path.join(CHECKPOINT_PATH, "pytorch_model.bin"))
new_prefix_state_dict = {}
for k, v in prefix_state_dict.items():
    if k.startswith("transformer.prefix_encoder."):
        new_prefix_state_dict[k[len("transformer.prefix_encoder."):]] = v
model.transformer.prefix_encoder.load_state_dict(new_prefix_state_dict)


model = model.quantize(4)
model.half().cuda()

Swipe left to see more

Comparison before and after model fine-tuning

1. Model testing

inputs = {
    "ask": "类型#上衣\*材质#牛仔布\*颜色#白色\*风格#简约\*图案#刺绣\*衣样式#外套\*衣款式#破洞"


}


response = predictor.predict(inputs)
print(response["answer"])

Swipe left to see more

2. Comparing the original ChatGLM model, for the same input, the output is more like advertising words, rather than pure semantic extraction.

8956e4b5782c23fc48b9a3388a8d43eb.jpeg

3. Clear resources

predictor.delete_endpoint()

Summarize

Big language models are in the ascendant and are changing and affecting the entire world in various ways. Customers embrace large language models, and the Amazon Cloud Technology team is also deeply exploring customer needs and large language model technology, which can better assist customers in realizing their needs and enhance business value in the future.

The author of this article

6f79d75ab486b7ea0f2fab7918a2f90c.jpeg

Takaiku

Amazon Cloud Technology Solutions Architect is mainly responsible for cloud migration for enterprise customers, helping customers with cloud architecture design and technical consulting, focusing on technical directions such as smart lake warehouses and AI/ML.

6ef09b601d749aca120e4a99f614edb7.gif

c11c749e0e473685a0f573d0e7810f92.gif

I heard, click the 4 buttons below

You won’t encounter bugs!

58feebdfc8ea1f23d6d413656219328c.gif

Guess you like

Origin blog.csdn.net/u012365585/article/details/132680541