Reading Note: Gated Self-Matching Networks for Reading Comprehension and Question Answering

Abstract

Authors present the gated self-matching networks for reading comprehension style question answering, which aims to answer questions from a given passage.

Firstly, math the question and passage with gated attention-based recurrent networks to obtatin the question-aware passage representation.
Then, utilize a self-matching attention mechanism to refine the presentation by matching the passage against itself.
Finally, employ the pointer networks to locate the positions of answers from the passage.

Introduction

This model (R-Net) consists of four parts:
1. the recurrent network encoder (to build representation for questions and passage separately)
2. the gated matching layer (to match the question and passage)
3. the self-matching layer (to aggregate information from the whole passage)
4. the pointer network layer (to predict the answer boundary)

Three-fold key contributions:
1. propose a gated attention-based recurrent network, assigning different levels of importance to passage parts depending on their relevance to the question
2. introduce a self-matching mechanism, effectively aggregating evidence from the whole passage to infer the answer and dynamically refining passage representation with information from the whole passage
3. yield state-of-the-art results against strong baselines

Task Description

Given a passage P

and a question Q, predict an answer A to question Q based on information in P

Methods

Gated Self-Matching Networks structure overview

R-NET structure overview

Question and Passage Encoder

Consider a question Q={wQt}mt=1

and a passage P={wPt}nt=1, firstly convert words to word-level embeddings ( {eQt}mt=1 and {ePt}nt=1) and character-level embeddings ( {cQt}mt=1 and {cPt}nt=1

) which are generated by taking final hidden states of a bi-directional recurrent neural network applied to embeddings of characters in the token. Such character-level embeddings have been shown to be helpful to deal with out-of-vocab tokens.

Then use a bi-directional RNN to produce new representation {uQt}mt=1

and {uPt}nt=1

.

扫描二维码关注公众号,回复: 1575258 查看本文章
uQt=BiRNNQ(uQt1,[eQt,cQt])uPt=BiRNNP(uPt1,[ePt,cPt])

Here, use Gated Recurent Unit (GRU) because it is computationally cheaper.

Gated Attention-based Recurrent Networks

Utilize a gated attention-based recurrent network (a variant of attention-based recurrent networks) to incorporate question information into passage representation.

Given {uQt}mt=1

and {uPt}nt=1, generate question-aware passage representation {vPt}nt=1

via soft-alignment of words

vPt=RNN(vPt1,[uPt,ct])

where [uPt,ct] is another gate to the input ([u^P_t, c_t]) of RNN:
gt=sigmoid(Wg[uPt,ct])[uPt,ct]=gt[uPt,ct]

ct=att(uQ,[uPt,uPt1]) is an attention-pooling vector of the whole question uQ which focuses on the relation between the question and the current passage word:
stjatict===wTtanh(WQuuQj+WPuuPj+WPvvPt1)exp(sti)/j=1mexp(stj)i=1matiuQi

where the vector wT and all matrices W contain weights to be learned.

Self-matching Attention

The self-matching attention is aim to solve the presentation with limited knowledge of context. It dynamically
1. coleects evidence from the whole passage words
2. encodes the evidence relevant to the current passage word and its matching question information into the passage representation hPt

:

hPt=RNN(hPt1,[vPt,ct])

where [vPt,ct] is another gate to the input ([v^P_t, c_t]) of RNN,
ct=att(vP,vPt]) is an attention-pooling vector of the whole question uQ which focuses on the relation between the question and the current passage word:
stjatict===wTtanh(WQuuQj+WPuuPj+WP~vvPt1)exp(sti)/j=1mexp(stj)i=1matiuQi

where the vector wT and all matrices W contain weights to be learned.

After the original self-matching layer of the passage, authors utilize bi-directional GRU to deeply integrate the matching results before feeding them into answer pointer layer. It helps to further propagate the information aggregated by self-matching of the passage.

Output Layer

Use an attention-polling over the question representation to generate the initial hidden vector for the pointer network to predict the start and end position of the answer.

Given a passage representation {hPt}nt=1

, the attention mechanism is utilized as a pointer to select the start position p1 and end position p2

:

stjatipt===wTtanh(WPhhPj+Wahhat1)exp(sti)/j=1nexp(stj)argmax(at1,at2,...,atn)

where hat1 represents the last hidden state of the pointer network,
hat is the attention-pooling vector based on current predicted probability at:
cthat==i=1natihPiRNN(hat1,ct)

And authors utilize the question vector rQ as the initial state of the pointer network, where rQ=att(uQ,VQr) is an attention-pooling vector of the question based on the parameter VQr:
sjairQ===wTtanh(WQuuQj+WQvVQr)exp(si)/j=1nexp(sj)i=1maiuQi

Objective Function

To train the network, minimize the objective function:

J=(i=1n1{p1=i}loga1i+i=1n1{p2=i}loga2i)

Implementation Details

  1. Use the tokenizer from Stanford CoreNLP to preprocess each passage and question
  2. Use the Gated Recurrent Unit
  3. Use GloVe embeddings for questions and passages and fix embeddings
  4. Use zero vectors to prepresent all out-of-vocab words
  5. Use 1 layer of bi-directional GRU to compute character-level embeddings and 3 layers of bi-directional GRU to encode questions and passages
  6. Use bi-directional gated attention-based recurrent network
  7. Set hidden vector length to 75 for all layers
  8. Set hidden size to 75 for attention scores
  9. Set dropout rate to 0.2
  10. Use AdaDelta (an initial learning rate of 1, the decay rate ρ of 0.95, constant ϵ of 1e6)

转载: https://blog.csdn.net/seanliu96/article/details/79381810

猜你喜欢

转载自blog.csdn.net/jdbc/article/details/80658045
今日推荐