Machine Comprehension Using Match-LSTM and Answer Pointer Paper Reading Notes

English title:

MACHINE COMPREHENSION USING MATCH-LSTM AND ANSWER POINTER

Author: Shuohang Wang, Jing Jiang

Source of the paper: ICLR 2017

Chinese title:

Machine understanding using match-LSTM and answer pointers

Download the original paper: pdf download

1. Scientific issues

1.1  The scientific issues involved in this article

The direction of machine reading comprehension in natural language processing. First display a piece of text to the machine, such as a news article or a story. The machine will then answer one or more questions related to the text.

1.2  How do peer experts solve

A question can be regarded as a multiple-choice question, the correct answer is selected from a set of candidate answers provided, such as Richardson (2013) and Hill (2016) et al. In the SQuAD data set proposed by Rajpurkar in 2016, the correct answer can be any sequence of symbols from a given text. The questions and answers of the datasets of Hermann (2015) and Hill (2016) are automatically created using cloze methods. Hermann (2015), Kadlec (2016), Cui et al. (2016) assume that the answer is a single token

1.3  Problems solved in this article

(What problem will be solved and the effect will be included in the general summary and summary)

Use the SQuAD dataset to study machine understanding of text. The answers in SQuAD do not come from a small group of candidate answers, and their length is variable. This paper proposes a new model for this problem to apply SQuAD dataset to realize machine understanding.

1.4 The  effect of this solution

Propose an end-to-end model structure. This model uses match-LSTM and Pointer Net. Two ways of using Pointer Net are applied in the model. The experiment proves that the effect of the two models is the best experimental result of the logistic regression model proposed by Rajpurkar in 2016.

 

2. Research content

2.1  Introduction to theories and methods

(Proposal of the main research content of the thesis, introduction of main technical routes, theories and methods)

Main technical theory:

  1. match-LSTM

match-LSTM was first proposed by Wang & Jiang in 2016 and applied to textual implication tasks in natural language processing. In the textual implication task, given two sentences, one sentence is set as a premise and the other sentence is set as a hypothesis. The goal of the task is to predict whether the premise implies a hypothesis. The Match-LSTM model traverses hypothetical feature units (words, phrases, etc.) in an orderly manner. At each traverse position of the hypothesis, the attention mechanism is used to obtain the weighted vector representation of the premise. This weighted premise vector is combined with the vector representation of the feature unit at the current hypothetical position into a vector, and then the vector is input into the LSTM. This is the match-LSTM model structure

 

  1. Pointer Net

Pointer Net is proposed by Vinyals in 2015 and applied to the output sequence that is constrained by the input sequence. Pointer Net uses the attention mechanism as a pointer to select a position in the input sequence as the output symbol.

 

The structure of the paper model:

Both model structures include a three-layer structure. The difference between the two is the third layer of answer pointers.

The first layer: pretreatment layer

This layer uses the standard one-way LSTM structure to process passage and question respectively

Where , is the dimension of the vector word

The second layer: match-LSTM layer

 

Where , it is the weight vector attention. Combined with passage as

Then input into the LSTM,

 

This process also requires a reverse match-LSTM, the process is the same as the above, and the parameters in the reverse process are the same as the above formula (2), that is, the same parameter.

Let it be , the same = , and finally get

 

The third layer: answer pointer layer

As the input of the answer pointer layer, this layer uses two kinds of Pointer Net: sequence model and boundary model.

  1. Sequence model

The sequence model generates a sequence of answer tokens, but these tokens may not be continuous in the original passage. , Is an integer between 1 and P, which is the position in the passage. Suppose , k takes the value 1 to P+1, when k=P+1, stop generating answers. That is, (k=P+1) is the end of the answer mark.

To get answers , use attention mechanism,

Define the following:

 

 

Answer sequence:

The probability of generating the answer sequence can then be modeled as

Train the model to minimize the following loss function :

  1. The boundary model only generates the start and end tags of the answer, and then treats all the tags between these two tags in the original paragraph as the answer.

The basic principle of the boundary model is the same as that of the sequence model. Only the prediction probability is different. The boundary model only needs to predict

 

2.2  Verification analysis and experimental results

(Experimental analysis and experimental results in the paper)

data set:

Training set: 87599 Q&A pair Development set: 10570 Q&A pair Test set: hidden

Use the GloVe model to train word vectors. Word vector dimension is set to 150 or 300, the use of the model is optimized.

Model performance evaluation, two evaluation methods: when comparing the markers in the predicted answer with the markers in the basic real answer, the percentage of exact matches with the markers in the basic real answer, and the word-level F1 score for evaluation.

result:

 

3. Problems in the thesis and focus of follow-up research

3.1  Problems with the paper

The model proposed in the paper is more difficult to predict for longer answers . For the question of'why', the model is more difficult to answer.

3.2  Follow-up research focus

Plans to further study different types of issues and focus on those that are currently underperforming, such as "why" issues. We also plan to test how our model can be applied to other machine understanding data sets.

 

4. Research results related to this issue

4.1  Related Paper One

(1)题目:Teaching machines to read and comprehend.

(2) Author introduction : Karl Moritz Hermann

(3)摘要: Teaching machines to read natural language documents remains an elusive challenge. Machine reading systems can be tested on their ability to answer questions posed on the contents of documents that they have seen, but until now large scale training and test datasets have been missing for this type of evaluation. In this work we define a new methodology that resolves this bottleneck and provides large scale supervised reading comprehension data. This allows us to develop a class of attention based deep neural networks that learn to read real documents and answer complex questions with minimal prior knowledge of language structure.

4.2  Related papers 2

(1)题目:Ask Me Anything: Dynamic Memory Networks for Natural Language Processing

(2) Author introduction : Kumar

(3) Summary:Most tasks in natural language processing can be cast into question answering (QA) problems over language input. We introduce the dynamic memory network (DMN), a neural network architecture which processes input sequences and questions, forms episodic memories, and generates relevant answers. Questions trigger an iterative attention process which allows the model to condition its attention on the inputs and the result of previous iterations. These results are then reasoned over in a hierarchical recurrent sequence model to generate answers. The DMN can be trained end-to-end and obtains state-of-the-art results on several types of tasks and datasets: question answering (Facebook's bAbI dataset), text classification for sentiment analysis (Stanford Sentiment Treebank) and sequence modeling for part-of-speech tagging (WSJ-PTB). The training for these different tasks relies exclusively on trained word vector representations and input-question-answer triplets.

4.3  Related papers 3

(1)题目:Text understanding with the attention sum reader network.

(2) Author introduction : Kadlec

(3)摘要:Several large cloze-style context-question-answer datasets have been introduced recently: the CNN and Daily Mail news data and the Children's Book Test. Thanks to the size of these datasets, the associated text comprehension task is well suited for deep-learning techniques that currently seem to outperform all alternative approaches. We present a new, simple model that uses attention to directly pick the answer from the context as opposed to computing the answer using a blended representation of words in the document as is usual in similar models. This makes the model particularly suitable for question-answering problems where the answer is a single word from the document. Ensemble of our models sets new state of the art on all evaluated datasets.

Guess you like

Origin blog.csdn.net/Thanours/article/details/95057276