"Natural Language Processing (NLP) paper reading" Dialogue in the Context of Waterloo && reconstruction [Huawei], session state tracking [Hopkins] && Amazon

Source: AINLPer micro-channel public number (click to look at it)
Edit: ShuYini
proofreading: ShuYini
Time: 2019-8-28

introduction

    The two have to share the first chapter of dialogue aimed at the reconstruction Chinese context, the author will be split into reference and co-reference analytic expression was detected in two parts, the model structure of a proposed end to end. The second issue tracking system for mission-dialogue session state, proposed two neural network architecture: Pointer network structure and network structure conversion.

First Blood

TILE: End-to-End Neural Context Reconstruction in Chinese Dialogue
Contributor: University of Waterloo && Huawei
Paper: https://www.aclweb.org/anthology/W19-4108
Code: None

Thesis

    This article is mainly to solve the problem of Chinese dialogue in the context of reconstruction , that the pronoun zero pronouns and other alleged phrase replaced by their alleged term, in the absence of such a context, the sentence may be a separate process directly. In the context of the reconstruction task can be broken down into Expression cited references are detected and resolved based on, presents a new end structure, respectively, and have completed the task. The main feature of this model is that it includes a position encoder and a speech based on neural networks and a new mask pronouns mechanism. In building such a model, a long-standing problem is the lack of training data, in order to solve this problem, this is generated by a large number of actual training data expansion method previously proposed. Due to a combination of more data and better models, in terms of common references are resolved (coreference resolution) and end-context reconstruction, this model can be higher than the most advanced methods of accuracy.

In this paper three Aspect

    1, the dialogue in the context of the reconstruction problem definition summed up in one detection problem and a scheduling problem and made the difference between it and the traditional role of pronouns, the pronoun zero detection proposed candidate selection;
    2, we analyzed the deep nerve application work in the dialogue, including the gradual end to end and methods;
    3, the context for the task of reconstruction, proposed an effective method to build a large number of silver data.

This article describes the method

This paper describes the principle method

    As shown in FIG. We assume that an input utterance q, its context is we are trying to rebuild according to other contexts of utterance c. In the context of the chat, c from the dialogue in the previous discourse. The reference data set, we use the first sentence positioning context, the contents of which appeared common reference. We assume that q and c have been labeled a. Our approach to the context of the reconstruction is divided into two sub-tasks: detection and resolution.
    Detection is a task sequence tags, it attempts to reference expressions to identify and recover the pronoun zero need to be resolved. In our running example, she (she) is identified, and a zero pronouns [Phi] (a target omitted).
    Resolution (resolution) is formulated as a ranking task. For each "slot" (above figure she and φ) need to be resolved, our model provides c q m (c,q,m) ranking triplets, wherein m m 1 m k m∈{m_1,…,m_k} It is parsed candidates. Candidate word is selected from the context of the noun phrase c. When reasoning, selecting the highest score as an alternative candidate word m words. If there are multiple slots needs to be resolved, our model will be parsed from left to right. As shown in the last line of the final output of the model of FIG. 1.

Model structure Introduction

    Model structures described herein as shown below:
    On the basis of binding of the detection and ordering module, proposes a mask structure, i.e., addition of a sentence presentation layer mask joint model. Mask vector sequence predicted from the detection module, it will be applied after the sentence coding matrix, the word close to zero pronouns pronouns projecting slots to receive the representation of the sentence mask, and applying the masking function of the maximum pool sentence into a mold matrix mapping vector. In this way, we force the model to select candidate word mentioned, where these words are likely candidates also appear in the vicinity of zero pronouns or pronouns. These words are usually a verb (such as love, publishing), but rarely is a preposition (such as through) or an adjective (as wonderful). Based on the above-mentioned two models, we will study means that a sentence representation and combined to construct a context-end reconstruction model, wherein the detection and joint analytical model training. As shown in detail in FIG model architecture.

Experimental results

    In CQA data set , the analysis result of end-co-reference     CONLL2012 dataset zero pronouns candidate ranking result     OntoNote dataset end-zero analysis result Pronoun

Double Kill

TILE: Improving Long Distance Slot carryover in Spoken Dialogue Systems.
Contributor: Johns Hopkins University && Amazon
Paper: https://www.aclweb.org/anthology/W19-4111
Code: None

Thesis

    会话状态跟踪是面向任务式对话系统的核心部分,跟踪对话状态的一种方法是时隙转移,关于时隙转移任务,先前的做法主要是为每个时隙做出独立决策的模型,但是其在较长的上下文对话中会导致较差的性能。为此本文提出对这些时隙进行共同建模。本文共提出了两种神经网络结构,一种是基于包含时隙排序信息的指针网络,另一种是基于transform网络,利用自注意机制对时隙相互依赖性进行建模。 在内部对话基准数据集和公共DSTC2数据集上的实验结果,本文模型能够解决较长距离的时隙引用,并能够实现不错的性能。

本文两大看点

    1、通过引入时隙相关性建模方法,改进了Naik等人的时隙转移模型体系结构。提出了两种基于指针网络和转换网络的神经网络模型,可以在时隙上进行联合预测。
    2、在内部基准和公共数据集上对所提模型进行了详细的分析。实验表明,时隙的上下文编码和模拟时隙相关性对于提高长对话上下文中时隙转移的性能至关重要,具有自注意力的转换架构可提供了最佳总体性能。

模型结构介绍

总体架构

    下图是上下文转移模型的一般架构模型的一般架构    其中,Bi-LSTM用于将对话中的话语编码为固定长度的对话表示,还可以嵌入上下文时隙值。时隙编码器使用时隙键、值和距离为每个候选时隙创建固定长度的时隙嵌入。给定编码的时隙、意图和对话上下文,解码器选择与当前用户请求相关的时隙子集。

时隙编码器

    距离可能包含重要信号。这个整数是奇数还是偶数,它提供了关于这个话语是由用户还是系统发出的信息。它越小,时隙就越接近当前的话语,因此隐含地更有可能被延续。基于这些思考,我们将距离编码为一个小向量(xdist,4维),并将其附加到整个时隙编码中: x = [ x k e y ; x v a l ; x d i s t ] x =[x_{key} ; x_{val} ; x_{dist}]

时隙解码器

    **指针网络解码器:**采用指针网络的结构(Vinyals等人,2015年)作为一种方法,对要转换时隙进行联合预测。指针网络是seq2seq模型的一个变种,它不是将输入序列转换为另一个输出序列,而是生成一系列输入序列的软指针(注意力向量),因此产生一个可变长度输入序列元素的顺序。模型图如下所示:
    **自注意力解码器:**与指针网络类似,自注意力机制也能够模拟对话中所有时隙之间的关系,而不管它们各自的位置如何。为了计算任何给定时隙的表示,自注意力模型将其与对话中的其它时隙进行比较。这些比较的结果是注意力得分,它决定了其他每个时隙对给定时隙的表示应该贡献多少。

实验结果

    内部数据集上不同距离时隙不同模型的转移性能(F1)     DSTC2数据集上不同距离时隙不同模型的转移性能(F1)    在内部数据集上,根据解析后的最终时隙数(y轴)和作为引用解析一部分的时隙数(x轴),绘制比较不同候选子集中模型性能(f1)的图。

ACED

Attention

更多自然语言处理相关知识,还请关注 AINLPer 公众号,极品干货即刻送达。

发布了43 篇原创文章 · 获赞 3 · 访问量 3814

Guess you like

Origin blog.csdn.net/yinizhilianlove/article/details/104033154