DFGN-Dynamically Fused Graph Network for Multi-hop Reasoning Paper Reading

introduce

Use the DFGN model for HotpotQA (a public dataset of type TBQA)

QA tasks focus on finding evidence and answers from a single text, but some do not require reasoning, just extraction. For this problem, there are some data sets that require multi-hop understanding tasks , such as WikiHop, Complex Web Question, HotpotQA

There are two major challenges in the multi-hop understanding task :

  1. It is necessary to filter noise from multiple chapters and extract useful information.
    • Some articles propose to build entity graphs from the input chapters, and use GNN to aggregate information through entity graphs.
    • Problem: Static global entity graph, implicit reasoning
    • This article: According to the query to build a dynamic local entity graph, explicit reasoning
  2. The answer to the question may not exist in the entity of the extracted entity graph
    • This article: Information Integration in Two Directions
      • doc2graph: Integrate document information into entity graphs
      • graph2doc: Integrate the information of the entity graph back into the document representation

Contributions to this article:

  • Propose DFGN to solve text-based multi-hop question answering problem
  • A method is proposed to interpret and evaluate reasoning chains, explaining entity graph masks predicted by DFGN
  • Experiments on the HotpotQA dataset verify the effectiveness of the model

related work

text-based QA

Given whether the supporting information is structured or not, QA tasks can be divided into two categories:

  • Knowledge-based, knowledge-based QA (KBQA)
  • Text-based, text-based QA (TBQA)

Based on the complexity of reasoning, it can be divided into two categories:

  • Single Hop: SQuAD
  • Multi-hop: HotpotQA

Information retrieval (IR) methods can be used for single-hop QA, but are difficult for multi-hop QA

Multi-hop QA inference

  • GNN, GAN, GRN (Graph Recurrent Network) have proved that reasoning is required in QA tasks
  • Coref-GRN utilizes GRN

Model

Please add a picture description

  • The model consists of five parts:
    • a paragraph selection subnetwork, paragraph selector
    • a module for entity graph construction, entity graph generator
    • an encoding layer encoding module
    • a fusion block for multi-hop reasoning Fusion Block for multi-hop prediction
    • a final prediction layer final prediction layer

paragraph selector

There are 10 passages in the hotpotQA dataset

Train a sub-network to select relevant passages and classify based on the BERT model

  • Input a query and a paragraph, output a 0-1 relevance score
  • For each Q&A pair, assign a value of 1 to a paragraph that has at least one supporting sentence
  • In the inference stage, select the prediction score greater than η \etaη paragraphs, and concatenated as contextCCC

Build Entity Graph

  • Use Stanford corenlp toolkit to identify named entity CCC , extracted to getNNN entities
  • Entity graph, with entities as points, the rules for adding edges are as follows:
    • A pair of entities appear in CCIn the same sentence in C (sentence-level links)
    • A pair of entities appear in CCC 中(context-level links)
    • A central entity is in the same paragraph as other entities (paragraph-level links)
  • In the QA dataset, title is the entity

Coding Query and Context

  • QQ _Q andCCC splicing, get representation from BERT model
    • Q = [ q 1 , … , q L ] ∈ R L × d 1 Q=[q_1,\ldots,q_L]\in R^{L\times d_1} Q=[q1,,qL]RL×d1
    • C T = [ c 1 , … , c M ] ∈ R M × d 1 C^T=[c_1,\ldots,c_M]\in R^{M\times d_1} CT=[c1,,cM]RM×d1
    • d 1 d_1 d1is the hidden layer size of BERT
    • The experiment found that it is better to pass into BERT after splicing than to pass into BERT separately
    • Passing the representation through the bi-attention layer to the representation between the query and the context, the effect is better than only BERT encoding

Fusion Block Reasoning

Please add a picture description

Doc2Graph

  • Computing Embeddings of Named Entities
  • 01 Matrix MMM represents the text span of an entity (text range)
    • M i , j = 1 M_{i,j}=1 Mi,j=1 meansiiThe i token is thejjthpart of j entities
  • Tok2Ent
    • The token embedding is passed to a mean-max pooling layer, and the entity embedding is calculated

Dynamic Graph Attention

  • Using GAT to calculate the attention score between two entities

Update Query

  • The most recently accessed entity will become the next start entity
  • Use bi-attention network to update query embeddings
  • Q ( t ) = \mathbf{Q}^{(t)}=Q(t)= Bi-Attention ( Q ( t − 1 ) , E ( t ) ) \left(\mathbf{Q}^{(t-1)}, \mathbf{E}^{(t)}\right) (Q(t1),E(t))

Graph2Doc

  • Restore the entity information back to the token in the context
  • Use the same 01 matrix MMM
    • M M Each row in M ​​represents a token
    • Use it from E t E_tEtSelect an entity in Embed -> ME MEME
  • Use LSTM to generate the context representation of the next layer
    • C ( t ) = LSTM ⁡ ( [ C ( t − 1 ) , M E ( t ) ⊤ ] ) \mathbf{C}^{(t)}=\operatorname{LSTM}\left(\left[\mathbf{C}^{(t-1)}, \mathbf{M} \mathbf{E}^{(t) \top}\right]\right) C(t)=LSTM([C(t1),ME(t)])

predict

  • Same structure as hotpotQA
  • Four outputs:
    1. supporting sentence
    2. start of answer
    3. end position of answer
    4. type of answer
  • Resolve output dependencies with a cascaded network
    • Four LSTMs F i \mathcal{F}_iFilayer upon layer
    • The context representation of the last fusion bloack is input into the first LSTM
    • O sup = F 0 ( C ( t ) ) O start = F 1 ( [ C ( t ) , O sup ] ) O end = F 2 ( [ C ( t ) , O sup , O start ] ) O type = F 3 ( [ C ( t ) , O sup , O end ] ) ^{(t)}\right) \\ \mathbf{O}_{\text {start }} &=\mathcal{F}_1\left(\left[\mathbf{C}^{(t)}, \mathbf{O}_{\text {sup }}\right]\right) \\ \mathbf{O}_{\text {end }} &=\mathcal{F}_2\left(\left [\mathbf{C}^{(t)}, \mathbf{O}_{\text {sup }}, \mathbf{O}_{\text {start}}\right]\right) \\ \mathbf{O}_{\text {type }} &=\mathcal{F}_3\left(\left[\mathbf{C}^{(t)}, \mathbf{O}_{\text {sup}}, \mathbf{O}_{\text {end}}\right]\right)\end{aligned}Osup Ostart Oend Otype =F0(C(t))=F1([C(t),Osup ])=F2([C(t),Osup ,Ostart ])=F3([C(t),Osup ,Oend ])
    • The logit O \mathbf{O} of each outputO Calculate the cross entropy loss
    • Combine the four losses and introduce different weights: L = L start + L end + λ s L sup + λ t L type \mathcal{L}=\mathcal{L}_{\text {start }}+\mathcal{L}_{\text {end }}+\lambda_s \mathcal{L}_{\text {sup }}+\lambda_t \mathcal{L}_ {\text {type }}L=Lstart +Lend +lsLsup +ltLtype 

Guess you like

Origin blog.csdn.net/iteapoy/article/details/128310109