introduce
- The previous article introduced how to use Paddle2.0 to build the GPT-2 model
- This time, we will use the previously built model to load the Qingyuan CPM-LM model parameters to realize a simple question-and-answer robot
Show results
- Supports two modes of question and answer and ancient poetry dictation
Quick experience
- You can quickly experience this project on the Baidu AIStudio platform: link
Qingyuan CPM
- Qingyuan CPM (Chinese Pretrained Models) is a large-scale pre-trained model open source project jointly developed by Beijing Zhiyuan Artificial Intelligence Research Institute and the research team of Tsinghua University
- Qingyuan Project is a large-scale pre-training model with Chinese as the core
- The first phase of open source content includes pre-trained Chinese language models and pre-trained knowledge representation models, which can be widely used in Chinese natural language understanding, generation tasks, and knowledge computing applications
- All models are freely available for download to academia and industry for research use
- Qingyuan CPM project official website , Github
Load CPM-LM model parameters
- Through the following code, you can use the previously built GPT-2 model to load the parameters of CPM-LM
- CPM-LM model parameters for Paddle platform can be downloaded here
- Since the official model parameters are stored in FP16 half-precision, it is necessary to convert the parameters to FP32 format in advance when loading
- Other places are no different from loading normal models
import paddle
from GPT2 import GPT2Model
# 初始化GPT-2模型
model = GPT2Model(
vocab_size=30000,
layer_size=32,
block_size=1024,
embedding_dropout=0.0,
embedding_size=2560,
num_attention_heads=32,
attention_dropout=0.0,
residual_dropout=0.0)
# 读取CPM-LM模型参数(FP16)
state_dict = paddle.load('CPM-LM.pdparams')
# FP16 -> FP32
for param in state_dict:
state_dict[param] = state_dict[param].astype('float32')
# 加载CPM-LM模型参数
model.set_dict(state_dict)
# 将模型设置为评估状态
model.eval()
Q&A Bot Implementation
- CPM-LM has a good ability to generate few-shot text, that is, it can input a few samples, and then learn the samples to generate corresponding text, as follows:
- ps. The following code is only for demonstration, and the model in it does not refer to the above model
inputs = '''默写古诗:
日照香炉生紫烟,遥看瀑布挂前川。
飞流直下三千尺,'''
outputs = model.predict(inputs, max_len=10, end_word='\n')
print(inputs+outputs)
Write an ancient poem silently:
The sun shines on the censer to produce purple smoke, and the waterfall hangs in front of the river from a distance.
Flying down three thousand feet, it is suspected that the Milky Way falls nine days.
inputs = '''问题:西游记是谁写的?
答案:'''
outputs = model.predict(inputs, max_len=10, end_word='\n')
print(inputs+outputs)
Question: Who wrote Journey to the West?
Answer: Wu Chengen.
inputs = '''小明决定去吃饭,小红继续写作业
问题:去吃饭的人是谁?
答案:'''
outputs = model.predict(inputs, max_len=10, end_word='\n')
print(inputs+outputs)
Xiao Ming decides to eat, Xiao Hong continues to do her homework
Question: Who is going to eat?
Answer: Xiaoming
inputs = '''默写英文:
狗:dog
猫:'''
outputs = model.predict(inputs, max_len=10, end_word='\n')
print(inputs+outputs)
Write English silently:
dog: dog
cat: cat
- So just by splicing a few simple few-shot prediction functions, you can realize a simple question-answering robot
Q&A bot code
- Let's briefly understand the running process of the program through the code
- For specific code details, refer to my GitHub project CPM-Generate-Paddle
- Decodes differently than samples used by official open source projects
- The simplest Greedy Search is used for decoding in this project, so the output corresponding to the same input is unique
- So this project is not suitable for generating article text, because the generated results are too fixed
import paddle
import argparse
import numpy as np
from GPT2 import GPT2Model, GPT2Tokenizer
# 参数设置
parser = argparse.ArgumentParser()
parser.add_argument("--pretrained_model", type=str, required=True, help="the detection model dir.")
args = parser.parse_args()
# 初始化GPT-2模型
model = GPT2Model(
vocab_size=30000,
layer_size=32,
block_size=1024,
embedding_dropout=0.0,
embedding_size=2560,
num_attention_heads=32,
attention_dropout=0.0,
residual_dropout=0.0)
print('正在加载模型,耗时需要几分钟,请稍后...')
# 读取CPM-LM模型参数(FP16)
state_dict = paddle.load(args.pretrained_model)
# FP16 -> FP32
for param in state_dict:
state_dict[param] = state_dict[param].astype('float32')
# 加载CPM-LM模型参数
model.set_dict(state_dict)
# 将模型设置为评估状态
model.eval()
# 加载编码器
tokenizer = GPT2Tokenizer(
'GPT2/bpe/vocab.json',
'GPT2/bpe/chinese_vocab.model',
max_len=512)
# 初始化编码器
_ = tokenizer.encode('_')
print('模型加载完成.')
# 基础预测函数
def predict(text, max_len=10):
ids = tokenizer.encode(text)
input_id = paddle.to_tensor(np.array(ids).reshape(1, -1).astype('int64'))
output, cached_kvs = model(input_id, use_cache=True)
nid = int(np.argmax(output[0, -1].numpy()))
ids += [nid]
out = [nid]
for i in range(max_len):
input_id = paddle.to_tensor(np.array([nid]).reshape(1, -1).astype('int64'))
output, cached_kvs = model(input_id, cached_kvs, use_cache=True)
nid = int(np.argmax(output[0, -1].numpy()))
ids += [nid]
# 若遇到'\n'则结束预测
if nid==3:
break
out.append(nid)
print(tokenizer.decode(out))
# 问答
def ask_question(question, max_len=10):
predict('''问题:中国的首都是哪里?
答案:北京。
问题:李白在哪个朝代?
答案:唐朝。
问题:%s
答案:''' % question, max_len)
# 古诗默写
def dictation_poetry(front, max_len=10):
predict('''默写古诗:
白日依山尽,黄河入海流。
%s,''' % front, max_len)
# 主程序
mode = 'q'
funs = ask_question
print('输入“切换”更换问答和古诗默写模式,输入“exit”退出')
while True:
if mode == 'q':
inputs = input("当前为问答模式,请输入问题:")
else:
inputs = input("当前为古诗默写模式,请输入古诗的上半句:")
if inputs=='切换':
if mode == 'q':
mode = 'd'
funs = dictation_poetry
else:
mode = 'q'
funs = ask_question
elif inputs=='exit':
break
else:
funs(inputs)
Summarize
- Through the simple implementation of two few-shot prediction functions, such a simple question-and-answer robot can be constructed to realize the functions of question-and-answer and ancient poem dictation
- It can be seen from this example that the few-shot text generation ability of the CPM-LM model is still good, and even the zero-shot performance is also good. Such a large Chinese pre-training model does have a little bit of GPT-3 flavor up