【计算机科学】【2017.06】【含源码】基于深度学习的语言建模

在这里插入图片描述
本文为丹麦科技大学(作者:Kongens Lyngby)的硕士论文,共114页。

本文研究了基于长短期记忆的递归神经网络(RNN)在短文本会话问题中的应用,具体研究了医学问题的答案生成任务。训练数据从在线服务中收集,其中包含用户生成的问题以及专业人员生成的相关答案。所提出的模型使用来自不同在线服务(如WebMD、HealthTap和iCliniq)的数据进行训练和评估,从中确定最佳数据集。所提出的解决问题的模型主要是由神经机器翻译中使用的模型启发的,但也包含基于迁移学习和多任务学习的扩展,它们都基于编码器-解码器框架,其中编码器产生问题的隐向量表示,解码器的状态从中初始化并用于生成答案,并且都采用端到端训练。提出了一种由解码器二进制输入控制的具有两种“模式”的解码器RNN模型结构。一种是“语言模型”模式,解码器接受一般医学/健康相关文本的训练;另一种是“答案生成”模式,解码器通过学习产生给定编码问题的答案。提出了一种新的模型结构,该模型结构处理答案生成任务,将问题类别分类为组合任务,其中网络从编码器的最终状态对问题类别进行分类,并将预测的类别作为额外输入馈送给解码器。最后,在一般医学/健康相关文本上训练一个单独的基于RNN的语言模型,并在推理过程中通过合并每个时间步长的概率来辅助所提出的模型,以提高生成答案的质量。

This thesis investigates the application of Long-short Term Memory-based Recurrent Neural Networks (RNN) to the Short Text Conversation problem. Specifically the task of generating answers to medical questions is studied. Training data is gathered from online services containing user-generated questions with associated answers generated by professionals. The proposed models are trained and evaluated using data originating from different online services like WebMD, HealthTap, and iCliniq, from which an optimal dataset is determined. The models proposed to solving the task are mainly inspired by models used in Neural Machine Translation but also contains extensions based on Transfer learning and Multi-task learning, are all based on the Encoder-decoder framework having an encoder compute a latent vector representation of a question from which the state of a decoder is initialized and used to generate an answer, and are all trained end-to-end. One model architecture containing a decoder RNN with two “modes”, controlled by a binary input to the decoder, are proposed. One mode being a “language-model” mode where the decoder is trained on general medical/health-related text, and another being a “answer generating” mode where the decoder learns to generate answers to given encoded questions. Another model architecture handling the answer generation task and the task of classifying question categories as a combined task is proposed, where the network classifies the question category from the final state of the encoder, and feeds the predicted class as an extra input to the decoder. Finally a separate RNN-based language model is trained on general medical/health related text and used to assist the proposed models during inference, by merging their probabilities at each time-step, in an attempt on improving the generated answer quality.

  1. 引言
  2. 理论
  3. 研究方法
  4. 结果与讨论
  5. 结论
    附录A 相关理论
    附录B 研究方法
    附录C 结果
    附录D 项目进展

更多精彩文章请关注公众号:在这里插入图片描述

发布了252 篇原创文章 · 获赞 157 · 访问量 33万+

猜你喜欢

转载自blog.csdn.net/weixin_42825609/article/details/104282846
今日推荐