Use GPT-2 to load the CPM-LM model to implement a simple question answering robot

introduce

Show results

  • Supports two modes of question and answer and ancient poetry dictation
    Show results

Quick experience

  • You can quickly experience this project on the Baidu AIStudio platform: link

Qingyuan CPM

  • Qingyuan CPM (Chinese Pretrained Models) is a large-scale pre-trained model open source project jointly developed by Beijing Zhiyuan Artificial Intelligence Research Institute and the research team of Tsinghua University
  • Qingyuan Project is a large-scale pre-training model with Chinese as the core
  • The first phase of open source content includes pre-trained Chinese language models and pre-trained knowledge representation models, which can be widely used in Chinese natural language understanding, generation tasks, and knowledge computing applications
  • All models are freely available for download to academia and industry for research use
  • Qingyuan CPM project official website , Github

Load CPM-LM model parameters

  • Through the following code, you can use the previously built GPT-2 model to load the parameters of CPM-LM
  • CPM-LM model parameters for Paddle platform can be downloaded here
  • Since the official model parameters are stored in FP16 half-precision, it is necessary to convert the parameters to FP32 format in advance when loading
  • Other places are no different from loading normal models
import paddle
from GPT2 import GPT2Model

# 初始化GPT-2模型
model = GPT2Model(
    vocab_size=30000,
    layer_size=32,
    block_size=1024,
    embedding_dropout=0.0,
    embedding_size=2560,
    num_attention_heads=32,
    attention_dropout=0.0,
    residual_dropout=0.0)

# 读取CPM-LM模型参数(FP16)
state_dict = paddle.load('CPM-LM.pdparams')

# FP16 -> FP32
for param in state_dict:
    state_dict[param] = state_dict[param].astype('float32')

# 加载CPM-LM模型参数
model.set_dict(state_dict)

# 将模型设置为评估状态
model.eval()

Q&A Bot Implementation

  • CPM-LM has a good ability to generate few-shot text, that is, it can input a few samples, and then learn the samples to generate corresponding text, as follows:
  • ps. The following code is only for demonstration, and the model in it does not refer to the above model
inputs = '''默写古诗:
日照香炉生紫烟,遥看瀑布挂前川。
飞流直下三千尺,'''
outputs = model.predict(inputs, max_len=10, end_word='\n')
print(inputs+outputs)

Write an ancient poem silently:
The sun shines on the censer to produce purple smoke, and the waterfall hangs in front of the river from a distance.
Flying down three thousand feet, it is suspected that the Milky Way falls nine days.

inputs = '''问题:西游记是谁写的?
答案:'''
outputs = model.predict(inputs, max_len=10, end_word='\n')
print(inputs+outputs)

Question: Who wrote Journey to the West?
Answer: Wu Chengen.

inputs = '''小明决定去吃饭,小红继续写作业
问题:去吃饭的人是谁?
答案:'''
outputs = model.predict(inputs, max_len=10, end_word='\n')
print(inputs+outputs)

Xiao Ming decides to eat, Xiao Hong continues to do her homework
Question: Who is going to eat?
Answer: Xiaoming

inputs = '''默写英文:
狗:dog
猫:'''
outputs = model.predict(inputs, max_len=10, end_word='\n')
print(inputs+outputs)

Write English silently:
dog: dog
cat: cat

  • So just by splicing a few simple few-shot prediction functions, you can realize a simple question-answering robot

Q&A bot code

  • Let's briefly understand the running process of the program through the code
  • For specific code details, refer to my GitHub project CPM-Generate-Paddle
  • Decodes differently than samples used by official open source projects
  • The simplest Greedy Search is used for decoding in this project, so the output corresponding to the same input is unique
  • So this project is not suitable for generating article text, because the generated results are too fixed
import paddle
import argparse
import numpy as np
from GPT2 import GPT2Model, GPT2Tokenizer

# 参数设置
parser = argparse.ArgumentParser()
parser.add_argument("--pretrained_model", type=str, required=True, help="the detection model dir.")
args = parser.parse_args()

# 初始化GPT-2模型
model = GPT2Model(
    vocab_size=30000,
    layer_size=32,
    block_size=1024,
    embedding_dropout=0.0,
    embedding_size=2560,
    num_attention_heads=32,
    attention_dropout=0.0,
    residual_dropout=0.0)

print('正在加载模型,耗时需要几分钟,请稍后...')

# 读取CPM-LM模型参数(FP16)
state_dict = paddle.load(args.pretrained_model)

# FP16 -> FP32
for param in state_dict:
    state_dict[param] = state_dict[param].astype('float32')

# 加载CPM-LM模型参数
model.set_dict(state_dict)

# 将模型设置为评估状态
model.eval()

# 加载编码器
tokenizer = GPT2Tokenizer(
    'GPT2/bpe/vocab.json',
    'GPT2/bpe/chinese_vocab.model',
    max_len=512)

# 初始化编码器
_ = tokenizer.encode('_')

print('模型加载完成.')

# 基础预测函数
def predict(text, max_len=10):
    ids = tokenizer.encode(text)
    input_id = paddle.to_tensor(np.array(ids).reshape(1, -1).astype('int64'))
    output, cached_kvs = model(input_id, use_cache=True)
    nid = int(np.argmax(output[0, -1].numpy()))
    ids += [nid]
    out = [nid]
    for i in range(max_len):
        input_id = paddle.to_tensor(np.array([nid]).reshape(1, -1).astype('int64'))
        output, cached_kvs = model(input_id, cached_kvs, use_cache=True)
        nid = int(np.argmax(output[0, -1].numpy()))
        ids += [nid]
        # 若遇到'\n'则结束预测
        if nid==3:
            break
        out.append(nid)
    print(tokenizer.decode(out))

# 问答
def ask_question(question, max_len=10):
    predict('''问题:中国的首都是哪里?
    答案:北京。
    问题:李白在哪个朝代?
    答案:唐朝。
    问题:%s
    答案:''' % question, max_len)

# 古诗默写
def dictation_poetry(front, max_len=10):
    predict('''默写古诗:
    白日依山尽,黄河入海流。
    %s,''' % front, max_len)

# 主程序
mode = 'q'
funs = ask_question
print('输入“切换”更换问答和古诗默写模式,输入“exit”退出')
while True:
    if mode == 'q':
        inputs = input("当前为问答模式,请输入问题:")
    else:
        inputs = input("当前为古诗默写模式,请输入古诗的上半句:")
    if inputs=='切换':
        if mode == 'q':
            mode = 'd'
            funs = dictation_poetry
        else:
            mode = 'q'
            funs = ask_question
    elif inputs=='exit':
        break
    else:
        funs(inputs)

Summarize

  • Through the simple implementation of two few-shot prediction functions, such a simple question-and-answer robot can be constructed to realize the functions of question-and-answer and ancient poem dictation
  • It can be seen from this example that the few-shot text generation ability of the CPM-LM model is still good, and even the zero-shot performance is also good. Such a large Chinese pre-training model does have a little bit of GPT-3 flavor up

Guess you like

Origin blog.csdn.net/jm_12138/article/details/111480941