[AWS Series] Use Amazon SageMaker to fine-tune and deploy the ChatGLM model

Preface

The big language model is an artificial intelligence model based on deep learning technology, which can be traced back to early language models and machine translation systems. Only recently, with the rise of deep learning technology, large-scale pre-trained language models have begun to attract widespread attention.

Large language models use large-scale text data sets for pre-training to learn rich language knowledge and context understanding capabilities. Through pre-training and fine-tuning, large language models can be used for various natural language processing tasks, such as text generation, machine translation, question and answer systems, dialogue systems, etc. They have demonstrated impressive performance in many fields and have become an important driving force in the development of artificial intelligence technology.

This article mainly introduces examples of how to use Amazon SageMaker to deploy and fine-tune the ChatGLM model.

This example mainly includes:

  1. ChatGLM General Introduction
  2. ChatGLM fine-tuning introduction
  3. ChatGLM environment settings
  4. ChatGLM fine-tuning training
  5. ChatGLM deployment test

Amazon SageMaker For more information, you can click on the link below to learn more:Amazon SageMaker

AmazonCloud TechnologyFor more information, please check the link belowAmazon Cloud Technology

1. Overall introduction to ChatGLM

The ChatGLM model is a conversational language model open sourced by Tsinghua University that supports Chinese-English bilingual question answering and is optimized for Chinese. The model is based on the General Language Model (GLM) architecture and has 6.2 billion parameters. Combined with model quantification technology, users can deploy it locally on consumer-grade graphics cards.

ChatGLM has the following features:

  • Sufficient bilingual pre-training in Chinese and English: ChatGLM has trained 1T of tokens on Chinese and English materials in a 1:1 ratio, and has bilingual capabilities.
  • Optimized model architecture and size: Fixed 2D RoPE position encoding implementation. The 6B (6.2 billion) parameter size also makes it possible for researchers and individual developers to fine-tune and deploy ChatGLM themselves.
  • Lower deployment threshold: At FP16 half-precision, ChatGLM requires at least 13 GB of video memory for inference. Combined with model quantization technology, this requirement can be further reduced to 10GB (INT8) and 6GB (INT4), allowing ChatGLM to be deployed at the consumer level on the graphics card.
  • Longer sequence length: ChatGLM has a sequence length of 2048, supporting longer conversations and applications.

2. Introduction to ChatGLM fine-tuning

Model fine-tuning is mainly divided into Full Fine-Tune and PEFT (Performance-Efficient Fine-Tune). The former will update all parameters of the model, resulting in longer training time and larger training resources; while the latter will freeze most parameters and fine-tune the training network structure. , the common methods are LoRA and P-Tuning v2. For ChatGLM, P-Tuning v2 is selected for model fine-tuning. The network structure is as follows: Prompt/Prefix is ​​added to all layers of Transformers.

3. ChatGLM environment settings

Note: The sample codes in the project are all stored in the code warehouse at the following address:Code Warehouse

1. Upgrade Python SDK

pip install --upgrade boto3
pip install --upgrade sagemaker
pip install huggingface_hub

  1. Get runtime resources

Including regions, roles, accounts, S3 buckets, etc.

import boto3
import sagemaker
from sagemaker import get_execution_role


sess = sagemaker.Session()
role = get_execution_role()
sagemaker_default_bucket = sess.default_bucket()


account = sess.boto_session.client("sts").get_caller_identity()["Account"]
region = sess.boto_session.region_name

4. ChatGLM fine-tuning training

4.1 Prepare for fine-tuning

1. Clone the code

rm -rf ChatGLM-6B
git clone https://github.com/THUDM/ChatGLM-6B.git
cd ChatGLM-6B
git checkout 163f94e160f08751545e3722730f1832d73b92d1

2. Download the data set

An example advertising data set is used here. The output of the advertising slogan is implemented based on the input. The format is as follows:

{
      
      
 "content": "类型#上衣版型#宽松版型#显瘦图案#线条衣样式#衬衫衣袖型#泡泡袖衣款式#抽绳",
 "summary": "这件衬衫的款式非常的宽松,利落的线条可以很好的隐藏身材上的小缺点,穿在身上有着很好的显瘦效果。领口装饰了一个可爱的抽绳,漂亮的绳结展现出了十足的个性,配合时尚的泡泡袖型,尽显女性甜美可爱的气息。"
}
# 下载 ADGEN 数据集
wget -O AdvertiseGen.tar.gz https://cloud.tsinghua.edu.cn/f/b3f119a008264b1cabd1/?dl=1


# 解压数据集
tar -xzvf AdvertiseGen.tar.gz

3. Download the original model of ChatGLM

from huggingface_hub import snapshot_download
from pathlib import Path




local_cache_path = Path("./model")
local_cache_path.mkdir(exist_ok=True)


model_name = "THUDM/chatglm-6b"


# Only download pytorch checkpoint files
allow_patterns = ["*.json", "*.pt", "*.bin", "*.model", "*.py"]


model_download_path = snapshot_download(
    repo_id=model_name,
    cache_dir=local_cache_path,
    allow_patterns=allow_patterns,
)


# Get the model files path
import os
from glob import glob


local_model_path = None


paths = os.walk(r'./model')
for root, dirs, files in paths:
    for file in files:
        if file == 'config.json':
            # print(os.path.join(root, file))
            local_model_path = str(os.path.join(root, file))[0:-11]
            print(local_model_path)
if local_model_path == None:
    print("Model download may failed, please check prior step!")

4. Copy the model and data to S3

chmod +x ./s5cmd
./s5cmd sync ${local_model_path} s3://${sagemaker_default_bucket}/llm/models/chatglm/original-6B/
./s5cmd sync ./AdvertiseGen/ s3://${sagemaker_default_bucket}/llm/datasets/chatglm/AdvertiseGen/


rm -rf model
rm -rf AdvertiseGen
rm -rf AdvertiseGen.tar.gz

4.2 Model fine-tuning

The fine-tuning of the model uses P-Tuning v2 to achieve a balance between cost and effect. There are many source codes for model fine-tuning changes. For details, please refer to the above git repository.

1. Model fine-tuning parameters

The key parameters for model fine-tuning settings are as follows:

  1. Prefix word length: 128
  2. Learning rate: 2e-2, ensuring loss decreases during training
  3. batch size:1
  4. gradient accumulation step:16
  5. Training step size: 50, the step size is only set to 50 steps, and you can already see the obvious fine-tuning results.
import time
from sagemaker.huggingface import HuggingFace




PRE_SEQ_LEN=128
LR=2e-2
BATCH_SIZE=1
GRADIENT_ACCUMULATION_STEPS=16
TRAIN_STEPS=50


job_name = f'huggingface-chatglm-finetune-ptuning-{time.strftime("%Y-%m-%d-%H-%M-%S", time.localtime())}'


instance_type  = "ml.g4dn.2xlarge"
instance_count = 1


# 基础模型存放地址
model_name_or_path = 's3://{}/llm/models/chatglm/original-6B/'.format(sagemaker_default_bucket)


# 微调模型输出地址
output_dir         = '/opt/ml/model/adgen-chatglm-6b-ft'
model_s3_path      = 's3://{}/llm/models/chatglm/finetune-ptuning-adgen/'.format(sagemaker_default_bucket)


# 模型环境变量设置
environment = {
      
      
    'PYTORCH_CUDA_ALLOC_CONF': 'max_split_size_mb:32',
    'TRAIN_DATASET'          : '/opt/ml/input/data/AdvertiseGen/train.json',
    'TEST_DATASET'           : '/opt/ml/input/data/AdvertiseGen/dev.json',
    'PROMPT_COLUMN'          : 'content',
    'RESPONSE_COLUMN'        : 'summary',
    'MODEL_NAME_OR_PATH'     : model_name_or_path,
    'OUTPUT_DIR'             : output_dir,
    'MODEL_OUTPUT_S3_PATH'   : model_s3_path,
    'TRAIN_STEPS'            : '50'
}


inputs = {
      
      
   'AdvertiseGen': f"s3://{sagemaker_default_bucket}/llm/datasets/chatglm/AdvertiseGen/"
}

2. Turn on model fine-tuning

huggingface_estimator = HuggingFace(
    entry_point          = 'sm_ptune_train.py',
    source_dir           = './ChatGLM-6B/ptuning',
    instance_type        = instance_type,
    instance_count       = instance_count,
    base_job_name        = job_name,
    role                 = role,
    script_mode          = True,
    transformers_version = '4.26',
    pytorch_version      = '1.13',
    py_version           = 'py39',
    environment          = environment
)


huggingface_estimator.fit(inputs=inputs)

5. ChatGLM deployment test

5.1 Model deployment

1. Prepare Dummy model

!touch dummy
!tar czvf model.tar.gz dummy
assets_dir = 's3://{0}/{1}/assets/'.format(sagemaker_default_bucket, 'chatglm')
model_data = 's3://{0}/{1}/assets/model.tar.gz'.format(sagemaker_default_bucket, 'chatglm')
!aws s3 cp model.tar.gz $assets_dir
!rm -f dummy model.tar.gz

2. Configure model parameters

from sagemaker.pytorch.model import PyTorchModel


model_name                  = None
entry_point                 = 'chatglm-inference-finetune.py'
framework_version           = '1.13.1'
py_version                  = 'py39'
base_model_name_or_path     = 's3://{}/llm/models/chatglm/original-6B/'.format(sagemaker_default_bucket)
finetune_model_name_or_path = 's3://{}/llm/models/chatglm/finetune-ptuning-adgen/adgen-chatglm-6b-ft/checkpoint-50/pytorch_model.bin'.format(sagemaker_default_bucket)


# 模型环境变量设置
model_environment  = {
      
      
    'SAGEMAKER_MODEL_SERVER_TIMEOUT': '600',
    'SAGEMAKER_MODEL_SERVER_WORKERS': '1',
    'MODEL_NAME_OR_PATH'            : base_model_name_or_path,
    'PRE_SEQ_LEN'                   : '128',
    'FINETUNE_MODEL_NAME_OR_PATH'   : finetune_model_name_or_path,
}


model = PyTorchModel(
    name              = model_name,
    model_data        = model_data,
    entry_point       = entry_point,
    source_dir        = './code',
    role              = role,
    framework_version = framework_version, 
    py_version        = py_version,
    env               = model_environment
)

3. Deploy fine-tuned model

from sagemaker.serializers import JSONSerializer
from sagemaker.deserializers import JSONDeserializer


endpoint_name         = None
instance_type         = 'ml.g4dn.2xlarge'
instance_count        = 1


predictor = model.deploy(
    endpoint_name          = endpoint_name,
    instance_type          = instance_type, 
    initial_instance_count = instance_count,
    serializer             = JSONSerializer(),
    deserializer           = JSONDeserializer()
)

4. The key model loading

The code is as follows: load the original ChatGLM model and load the PrefixEncoder parameters of FineTune to perform inference together.

import torch
import os


from transformers import AutoConfig, AutoModel, AutoTokenizer


# 载入Tokenizer
tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True)


# 如果需要加载的是新 Checkpoint(只包含 PrefixEncoder 参数):
config = AutoConfig.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True, pre_seq_len=128)
model = AutoModel.from_pretrained("THUDM/chatglm-6b", config=config, trust_remote_code=True)
prefix_state_dict = torch.load(os.path.join(CHECKPOINT_PATH, "pytorch_model.bin"))
new_prefix_state_dict = {}
for k, v in prefix_state_dict.items():
    if k.startswith("transformer.prefix_encoder."):
        new_prefix_state_dict[k[len("transformer.prefix_encoder."):]] = v
model.transformer.prefix_encoder.load_state_dict(new_prefix_state_dict)


model = model.quantize(4)
model.half().cuda()

5.2 Comparison before and after model fine-tuning

1. Model testing

 inputs = {
      
      
    "ask": "类型#上衣\*材质#牛仔布\*颜色#白色\*风格#简约\*图案#刺绣\*衣样式#外套\*衣款式#破洞"


}


response = predictor.predict(inputs)
print(response["answer"])

  1. Compared with the original ChatGLM model,

For the same input, the output is more like advertising words rather than pure semantic extraction.

2. Clear resources

predictor.delete_endpoint()

6. Summary

Big language models are in the ascendant and are changing and affecting the entire world in various ways. Customers embrace large language models, and the Amazon Cloud Technology team is also deeply exploring customer needs and large language model technology, which can better assist customers in realizing their needs and enhance business value in the future.

If you are interested in large models, you can visit the link below to learn more about large models.

Amazon Cloud Technology

Guess you like

Origin blog.csdn.net/weixin_36755535/article/details/134282416