Article Directory
introduce
Use the DFGN model for HotpotQA (a public dataset of type TBQA)
QA tasks focus on finding evidence and answers from a single text, but some do not require reasoning, just extraction. For this problem, there are some data sets that require multi-hop understanding tasks , such as WikiHop, Complex Web Question, HotpotQA
There are two major challenges in the multi-hop understanding task :
- It is necessary to filter noise from multiple chapters and extract useful information.
- Some articles propose to build entity graphs from the input chapters, and use GNN to aggregate information through entity graphs.
- Problem: Static global entity graph, implicit reasoning
- This article: According to the query to build a dynamic local entity graph, explicit reasoning
- The answer to the question may not exist in the entity of the extracted entity graph
- This article: Information Integration in Two Directions
- doc2graph: Integrate document information into entity graphs
- graph2doc: Integrate the information of the entity graph back into the document representation
- This article: Information Integration in Two Directions
Contributions to this article:
- Propose DFGN to solve text-based multi-hop question answering problem
- A method is proposed to interpret and evaluate reasoning chains, explaining entity graph masks predicted by DFGN
- Experiments on the HotpotQA dataset verify the effectiveness of the model
related work
text-based QA
Given whether the supporting information is structured or not, QA tasks can be divided into two categories:
- Knowledge-based, knowledge-based QA (KBQA)
- Text-based, text-based QA (TBQA)
Based on the complexity of reasoning, it can be divided into two categories:
- Single Hop: SQuAD
- Multi-hop: HotpotQA
Information retrieval (IR) methods can be used for single-hop QA, but are difficult for multi-hop QA
Multi-hop QA inference
- GNN, GAN, GRN (Graph Recurrent Network) have proved that reasoning is required in QA tasks
- Coref-GRN utilizes GRN
Model
- The model consists of five parts:
- a paragraph selection subnetwork, paragraph selector
- a module for entity graph construction, entity graph generator
- an encoding layer encoding module
- a fusion block for multi-hop reasoning Fusion Block for multi-hop prediction
- a final prediction layer final prediction layer
paragraph selector
There are 10 passages in the hotpotQA dataset
Train a sub-network to select relevant passages and classify based on the BERT model
- Input a query and a paragraph, output a 0-1 relevance score
- For each Q&A pair, assign a value of 1 to a paragraph that has at least one supporting sentence
- In the inference stage, select the prediction score greater than η \etaη paragraphs, and concatenated as contextCCC
Build Entity Graph
- Use Stanford corenlp toolkit to identify named entity CCC , extracted to getNNN entities
- Entity graph, with entities as points, the rules for adding edges are as follows:
- A pair of entities appear in CCIn the same sentence in C (sentence-level links)
- A pair of entities appear in CCC 中(context-level links)
- A central entity is in the same paragraph as other entities (paragraph-level links)
- In the QA dataset, title is the entity
Coding Query and Context
- QQ _Q andCCC splicing, get representation from BERT model
- Q = [ q 1 , … , q L ] ∈ R L × d 1 Q=[q_1,\ldots,q_L]\in R^{L\times d_1} Q=[q1,…,qL]∈RL×d1
- C T = [ c 1 , … , c M ] ∈ R M × d 1 C^T=[c_1,\ldots,c_M]\in R^{M\times d_1} CT=[c1,…,cM]∈RM×d1
- d 1 d_1 d1is the hidden layer size of BERT
- The experiment found that it is better to pass into BERT after splicing than to pass into BERT separately
- Passing the representation through the bi-attention layer to the representation between the query and the context, the effect is better than only BERT encoding
Fusion Block Reasoning
Doc2Graph
- Computing Embeddings of Named Entities
- 01 Matrix MMM represents the text span of an entity (text range)
- M i , j = 1 M_{i,j}=1 Mi,j=1 meansiiThe i token is thejjthpart of j entities
- Tok2Ent
- The token embedding is passed to a mean-max pooling layer, and the entity embedding is calculated
Dynamic Graph Attention
- Using GAT to calculate the attention score between two entities
Update Query
- The most recently accessed entity will become the next start entity
- Use bi-attention network to update query embeddings
- Q ( t ) = \mathbf{Q}^{(t)}=Q(t)= Bi-Attention ( Q ( t − 1 ) , E ( t ) ) \left(\mathbf{Q}^{(t-1)}, \mathbf{E}^{(t)}\right) (Q(t−1),E(t))
Graph2Doc
- Restore the entity information back to the token in the context
- Use the same 01 matrix MMM
- M M Each row in M represents a token
- Use it from E t E_tEtSelect an entity in Embed -> ME MEME
- Use LSTM to generate the context representation of the next layer
- C ( t ) = LSTM ( [ C ( t − 1 ) , M E ( t ) ⊤ ] ) \mathbf{C}^{(t)}=\operatorname{LSTM}\left(\left[\mathbf{C}^{(t-1)}, \mathbf{M} \mathbf{E}^{(t) \top}\right]\right) C(t)=LSTM([C(t−1),ME(t)⊤])
predict
- Same structure as hotpotQA
- Four outputs:
- supporting sentence
- start of answer
- end position of answer
- type of answer
- Resolve output dependencies with a cascaded network
- Four LSTMs F i \mathcal{F}_iFilayer upon layer
- The context representation of the last fusion bloack is input into the first LSTM
- O sup = F 0 ( C ( t ) ) O start = F 1 ( [ C ( t ) , O sup ] ) O end = F 2 ( [ C ( t ) , O sup , O start ] ) O type = F 3 ( [ C ( t ) , O sup , O end ] ) ^{(t)}\right) \\ \mathbf{O}_{\text {start }} &=\mathcal{F}_1\left(\left[\mathbf{C}^{(t)}, \mathbf{O}_{\text {sup }}\right]\right) \\ \mathbf{O}_{\text {end }} &=\mathcal{F}_2\left(\left [\mathbf{C}^{(t)}, \mathbf{O}_{\text {sup }}, \mathbf{O}_{\text {start}}\right]\right) \\ \mathbf{O}_{\text {type }} &=\mathcal{F}_3\left(\left[\mathbf{C}^{(t)}, \mathbf{O}_{\text {sup}}, \mathbf{O}_{\text {end}}\right]\right)\end{aligned}Osup Ostart Oend Otype =F0(C(t))=F1([C(t),Osup ])=F2([C(t),Osup ,Ostart ])=F3([C(t),Osup ,Oend ])
- The logit O \mathbf{O} of each outputO Calculate the cross entropy loss
- Combine the four losses and introduce different weights: L = L start + L end + λ s L sup + λ t L type \mathcal{L}=\mathcal{L}_{\text {start }}+\mathcal{L}_{\text {end }}+\lambda_s \mathcal{L}_{\text {sup }}+\lambda_t \mathcal{L}_ {\text {type }}L=Lstart +Lend +lsLsup +ltLtype