NLP (xviii) ALBERT enhance the use of the model to predict the speed attempt

cutting edge

  In the article the use of tensorflow-serving deployment kashgari model NLP (XVII) , the author describes how to use tensorflow-serving deployment to deploy depth model model, in that article, the author uses kashgari module implements the classic BERT + Bi-LSTM + CRF model structure, at the time of the marked text corpus (approximately more than 2,000 training sentences) also achieved good recognition results, but there are shortcomings, and that is the model prediction time is too long, on average forecast a time sentence takes about 400 milliseconds, this prediction rate in a production environment or actual applications can not be tolerated.
  View of the model took reasons, because a large part of the call BERT. BERT is the current hottest and most famous pre-training model, although it will make training models predict increased time-consuming, but one of the best modeling tool is also under small sample corpus, therefore, BERT on the model architecture is indispensable of. So, how to avoid the use of model pre-trained model takes too long to bring the issue to predict it?
  In this paper we decided to try to use ALBERT, ALBERT application to verify model predictions in improving the speed, but also considered himself a practical use for it ~ ALBERT

ALBERT Profile

  We might as well take some time to simply look ALBERT. ALBERT is the only pre-training model of open source last week, its Github URL is: https: //github.com/brightmart/albert_zh, it can refer to the paper URL: https: //arxiv.org/pdf/1909.11942.pdf.
  According to the Github introduced ALBERT, ALBERT carried out on the massive Chinese corpus pre-training, fewer parameters of the model, the better. To albert_tiny_zh example, the file size is 16M, parameters 1.8M, the model size is only 1/25 of BERT, the effect is only slightly worse than the BERT or better on some NLP tasks. In the pre-training model in this article, we will be used albert_tiny_zh.

ALBERT recognition model using the training time

  Our model code template Github in bertNER oriented sub-projects in the project, implemented as BERT + Bi-LSTM + CRF, we will BERT replaced ALBERT, that is the author of the project model ALBERT + Bi Code -LSTM + CRF, while replacing bert folder is alert_zh, replace a pre-trained model folder chinese_L-12_H-768_A-12 ( BERT Chinese model pre-trained document) albert_tiny. Of course, part of the project also need to modify the source code to adapt the model training ALBERT.
  Datasets using the time corpus author's own label, which marked the time of the sentence, the sentence about 2000 +, 75% of the training set (time.train file), 10% of the validation set (time.dev files), as 15% test set (time.test file). Here I do not intend to give a specific Python code, because the project is more complex, interested readers can go to see the amount of the project's Github address:
  Some model parameters can be as follows:

  • Pre-training model: ALBERT (tiny)
  • The maximum character length of training samples: 128
  • batch_size: 8
  • epoch: 100
  • The number of two-way LSTM: 100

  ALBERT model training time will be significantly improved, we wait patiently for model training is completed. Performance in time.dev time.test dataset and the following table:

data set precision recall f1
time.dev 81.41% 84.95% 83.14%
time.test 83.03% 86.38% 84.67%

  The author then using these models, a model to predict a tornado encapsulates an HTTP service specific code is as follows:

# -*- coding: utf-8 -*-

import os
import json
import time
import pickle
import traceback

import tornado.httpserver
import tornado.ioloop
import tornado.options
import tornado.web
from tornado.options import define, options

import tensorflow as tf
from utils import create_model, get_logger
from model import Model
from loader import input_from_line
from train import FLAGS, load_config, train

# 定义端口为12306
define("port", default=12306, help="run on the given port", type=int)
# 导入模型
config = load_config(FLAGS.config_file)
logger = get_logger(FLAGS.log_file)
# limit GPU memory
tf_config = tf.ConfigProto()
tf_config.gpu_options.allow_growth = False
with open(FLAGS.map_file, "rb") as f:
    tag_to_id, id_to_tag = pickle.load(f)

sess = tf.Session(config=tf_config)
model = create_model(sess, Model, FLAGS.ckpt_path, config, logger)

# 模型预测的HTTP接口
class ResultHandler(tornado.web.RequestHandler):
    # post函数
    def post(self):
        event = self.get_argument('event')
        result = model.evaluate_line(sess, input_from_line(event, FLAGS.max_seq_len, tag_to_id), id_to_tag)
        self.write(json.dumps(result, ensure_ascii=False))

# 主函数
def main():
    # 开启tornado服务
    tornado.options.parse_command_line()
    # 定义app
    app = tornado.web.Application(
            handlers=[
                      (r'/subj_extract', ResultHandler)
                     ], #网页路径控制
           )
    http_server = tornado.httpserver.HTTPServer(app)
    http_server.listen(options.port)
    tornado.ioloop.IOLoop.instance().start()

main()

Model predicts speed yet?

  After the model predictive packaged as HTTP service, we use the model to test Postman and predict the effect of time, as shown below:

You can see, the model predicted the correct result, and only takes 38ms.
  Then we try to test more than a few sentences of the test, the test code is as follows:

# Daxing, Beijing
import requests
import json
import time

url = 'http://localhost:12306/subj_extract'

texts = ['据《新闻联播》报道,9月9日至11日,中央纪委书记赵乐际到河北调研。',
         '记者从国家发展改革委、商务部相关方面获悉,日前美方已决定对拟于10月1日实施的中国输美商品加征关税措施做出调整,中方支持相关企业从即日起按照市场化原则和WTO规则,自美采购一定数量大豆、猪肉等农产品,国务院关税税则委员会将对上述采购予以加征关税排除。',
         '据印度Zee新闻网站12日报道,亚洲新闻国际通讯社援引印度军方消息人士的话说,9月11日的对峙事件发生在靠近班公错北岸的实际控制线一带。',
         '儋州市决定,从9月开始,对城市低保、农村低保、特困供养人员、优抚对象、领取失业保险金人员、建档立卡未脱贫人口等低收入群体共3万多人,发放猪肉价格补贴,每人每月发放不低于100元补贴,以后发放标准,将根据猪肉价波动情况进行动态调整。',
         '9月11日,华为心声社区发布美国经济学家托马斯.弗里德曼在《纽约时报》上的专栏内容,弗里德曼透露,在与华为创始人任正非最近一次采访中,任正非表示华为愿意与美国司法部展开话题不设限的讨论。',
         '造血干细胞移植治疗白血病技术已日益成熟,然而,通过该方法同时治愈艾滋病目前还是一道全球尚在攻克的难题。',
         '英国航空事故调查局(AAIB)近日披露,今年2月6日一趟由德国法兰克福飞往墨西哥坎昆的航班上,因飞行员打翻咖啡使操作面板冒烟,导致飞机折返迫降爱尔兰。',
         '当地时间周四(9月12日),印度尼西亚财政部长英卓华(Sri Mulyani Indrawati)明确表示:特朗普的推特是风险之一。',
         '华中科技大学9月12日通过其官方网站发布通报称,9月2日,我校一硕士研究生不幸坠楼身亡。',
         '微博用户@ooooviki 9月12日下午公布发生在自己身上的惊悚遭遇:一个自称网警、名叫郑洋的人利用职务之便,查到她的完备的个人信息,包括但不限于身份证号、家庭地址、电话号码、户籍变动情况等,要求她做他女朋友。',
         '今天,贵阳取消了汽车限购,成为目前全国实行限购政策的9个省市中,首个取消限购的城市。',
         '据悉,与全球同步,中国区此次将于9月13日于iPhone官方渠道和京东正式开启预售,京东成Apple中国区唯一官方授权预售渠道。',
         '根据央行公布的数据,截至2019年6月末,存款类金融机构住户部门短期消费贷款规模为9.11万亿元,2019年上半年该项净增3293.19亿元,上半年增量看起来并不乐观。',
         '9月11日,一段拍摄浙江万里学院学生食堂的视频走红网络,视频显示该学校食堂不仅在用餐区域设置了可以看电影、比赛的大屏幕,还推出了“一人食”餐位。',
         '当日,在北京举行的2019年国际篮联篮球世界杯半决赛中,西班牙队对阵澳大利亚队。',
         ]

t1 = time.time()
for text in texts:
    data = {'event': text.replace(' ', '')}
    req = requests.post(url, data)
    if req.status_code == 200:
        print('原文:%s' % text)
        res = json.loads(req.content)['entities']
        print('抽取结果:%s' % str([_['word'] for _ in res]))


t2 = time.time()
print('一共耗时:%ss.' % str(round(t2-t1, 4)))

Output:

原文:据《新闻联播》报道,9月9日至11日,中央纪委书记赵乐际到河北调研。
抽取结果:['9月9日至11日']
原文:记者从国家发展改革委、商务部相关方面获悉,日前美方已决定对拟于10月1日实施的中国输美商品加征关税措施做出调整,中方支持相关企业从即日起按照市场化原则和WTO规则,自美采购一定数量大豆、猪肉等农产品,国务院关税税则委员会将对上述采购予以加征关税排除。
抽取结果:['日前', '10月1日']
原文:据印度Zee新闻网站12日报道,亚洲新闻国际通讯社援引印度军方消息人士的话说,9月11日的对峙事件发生在靠近班公错北岸的实际控制线一带。
抽取结果:['12日', '9月11日']
原文:儋州市决定,从9月开始,对城市低保、农村低保、特困供养人员、优抚对象、领取失业保险金人员、建档立卡未脱贫人口等低收入群体共3万多人,发放猪肉价格补贴,每人每月发放不低于100元补贴,以后发放标准,将根据猪肉价波动情况进行动态调整。
抽取结果:['9月']
原文:9月11日,华为心声社区发布美国经济学家托马斯.弗里德曼在《纽约时报》上的专栏内容,弗里德曼透露,在与华为创始人任正非最近一次采访中,任正非表示华为愿意与美国司法部展开话题不设限的讨论。
抽取结果:['9月11日']
原文:造血干细胞移植治疗白血病技术已日益成熟,然而,通过该方法同时治愈艾滋病目前还是一道全球尚在攻克的难题。
抽取结果:[]
原文:英国航空事故调查局(AAIB)近日披露,今年2月6日一趟由德国法兰克福飞往墨西哥坎昆的航班上,因飞行员打翻咖啡使操作面板冒烟,导致飞机折返迫降爱尔兰。
抽取结果:['近日', '今年2月6日']
原文:当地时间周四(9月12日),印度尼西亚财政部长英卓华(Sri Mulyani Indrawati)明确表示:特朗普的推特是风险之一。
抽取结果:['当地时间周四(9月12日)']
原文:华中科技大学9月12日通过其官方网站发布通报称,9月2日,我校一硕士研究生不幸坠楼身亡。
抽取结果:['9月12日', '9月2日']
原文:微博用户@ooooviki 9月12日下午公布发生在自己身上的惊悚遭遇:一个自称网警、名叫郑洋的人利用职务之便,查到她的完备的个人信息,包括但不限于身份证号、家庭地址、电话号码、户籍变动情况等,要求她做他女朋友。
抽取结果:['9月12日下午']
原文:今天,贵阳取消了汽车限购,成为目前全国实行限购政策的9个省市中,首个取消限购的城市。
抽取结果:['今天', '目前']
原文:据悉,与全球同步,中国区此次将于9月13日于iPhone官方渠道和京东正式开启预售,京东成Apple中国区唯一官方授权预售渠道。
抽取结果:['9月13日']
原文:根据央行公布的数据,截至2019年6月末,存款类金融机构住户部门短期消费贷款规模为9.11万亿元,2019年上半年该项净增3293.19亿元,上半年增量看起来并不乐观。
抽取结果:['2019年6月末', '2019年上半年', '上半年']
原文:9月11日,一段拍摄浙江万里学院学生食堂的视频走红网络,视频显示该学校食堂不仅在用餐区域设置了可以看电影、比赛的大屏幕,还推出了“一人食”餐位。
抽取结果:['9月11日']
原文:当日,在北京举行的2019年国际篮联篮球世界杯半决赛中,西班牙队对阵澳大利亚队。
抽取结果:['当日', '2019年']
一共耗时:0.5314s.

Can see that for the 15 test sentences, high recognition accuracy, and predictive time-consuming to 531ms, then the average predicted no more than 40ms. In comparison, the article the NLP (XVII) using tensorflow-serving kashgari deployment model in the model, the model prediction time is more than 1 second every word, the model predicted a belt speed of 25 times ALBERT Model.
  Thus, ALBERT model does improve model prediction time, and significant results & effect.

to sum up

  Because ALBERT open less than a week, and the author's knowledge, limited capacity, and therefore, there may be insufficient in terms of the code. However, as the first use of ALBERT over, hoping to share with you.
  This article is by no means copying and piling up the aforementioned project code, the project incorporates the author's own thinking, I hope not to be misunderstood as plagiarism. I use the above bertNER and ALBERT, ALBERT predict the speed just to verify the effect of time-consuming aspects of the model, but the fact is, ALBERT really gave me a big surprise, the authors feel the source code -
  Finally, attach this paper the author Github project address: https: //github.com/percent4/ALBERT_4_Time_Recognition.
  People look for him thousands of Baidu. I look back, that person is in, the lights dim.

references

  1. Ultra-small BERT Chinese version turned out! Only model 16M, training 10 times faster: https: //mp.weixin.qq.com/s/eVlNpejrxdE4ctDTBM-fiA
  2. ALBERT's Github address: https: //github.com/brightmart/albert_zh
  3. Github address bertNER project: https: //github.com/yumath/bertNER
  4. NLP (xvii) use tensorflow-serving deployment kashgari model: https://www.cnblogs.com/jclian91/p/11526547.html

Guess you like

Origin www.cnblogs.com/jclian91/p/11701400.html