【Paper Notes】Medical Dialogue Response Generation with Pivotal Information Recalling

Medical Dialogue Response Generation with Pivotal Information Recalling

insert image description here

  • Task: Medical Dialogue Generation
  • Conference: KDD 2022

1. Motivation

Existing jobs:

(1) No attention is paid to the medical entity relationship between utterances.

(2) Key information cannot be fully utilized.

In order to have a comprehensive understanding of the patient's condition, medical dialogues are often relatively long, and there are rich medical professional terms scattered in each utterance. In turn, the generation of induced reversion plays a key role.

Existing work is deficient in exploiting complex medical relationships between different utterances , resulting in the inability of existing medical dialogue generation models to obtain key information from long dialogue histories to generate accurate and informative replies, mainly due to the The cross-attention mechanism is not trained with the supervised signal shown when performing key information recall.

As shown in the figure, there is a symptom relationship between "tenesmus" in the first sentence and "enteritis" in the fourth sentence. Due to ignoring this medical relationship, the key entity "colitis" may be missing in the generated reply.

insert image description here

2. Main idea

Modeling complex medical relationships among multiple utterances explicitly guides the decoder to fully utilize key information in the dialogue history during response generation.

This paper proposes Pivotal Information Recalling (MedPIR), which consists of two parts: a knowledge-aware dialogue graph encoder and a recall-enhanced generator . A knowledge-aware dialogue graph encoder builds a dialogue graph by exploiting the relations between entities in utterances . The dialogue graph generated by graph attention is fed into the generator, so the knowledge-aware dialogue graph encoder can facilitate the generator to use key medical information distributed in multiple sentences from the perspective of global dialogue structure .

The recall-augmented generator then reinforces the use of key information by generating a summary of the conversation before generating an exact reply . The Recall Augmented Generator is designed to first explicitly generate key information from long dialogue histories. Then, the recall information is used as a prefix of the reply to generate a reply that focuses more on key information. In this way,

(1) The recall generator enforces the use of a cross-attention mechanism to fully utilize the key information from the encoder with recall signals.

(2) In addition, the recall-enhanced generator also strengthens the interaction between the key information recalled from the dialogue history and the replies through the self-attention mechanism inside the decoder.

3. Model

​ Use BERT-GPT as the backbone, BERT as the encoder, and GPT as the decoder. The context encoder obtains the context representation H ctx H_{ctx} by encoding the concatenated dialogue historyHctx. Then by retrieving external knowledge, knowledge representation is obtained through a knowledge encoder: H k H_kHk

3.1 Knowledge-aware Dialogue Graph Encoder(KDGE)

motivation

Since the basic dialogue model only regards medical dialogue history as a sequence of utterances, it is difficult to model multiple medical causal relationships between different utterances, and these complex relationships imply key medical information that induces the next response . To deal with this problem, this paper proposes KDGE to construct a dialogue knowledge graph and encode the graph with a graph attention network.

Construction of Global Dialogue Knowledge Graph

​ First, the dialogue history is converted into a graph, and each utterance is regarded as a node . There are two types of edges between nodes, one is regular timing edges , and the other is knowledge perception edges , which connect scattered discourses through medical relations. These knowledge-aware edges incorporate medical knowledge from external medical knowledge graphs into dialogues, enabling the model to represent complex medical relationships between utterances.

Specifically, we first extract medical entities from each sentence, and then look up their relations in the external knowledge graph CMeKG. If there is some relation between medical entities from two utterances, we add a knowledge-aware edge between the two utterances.

insert image description here

Structural Encoding of Dialogue Knowledge Graphs

After constructing knowledge-aware dialogue graphs, we use Relational Graph Attention Network (RGAT) to encode key relational information in these dialogues . Relational Graph Attention Networks (2019).

  • Sentence-Level Representations of Discourse : For Graph GGEach nodevi v_i in Gvi, we use a transformer-based encoder to encode its corresponding utterance, and then calculate its average word representation to obtain a sentence-level representation (actually MeanPooling): hi h_ihi

  • The node representation of the utterance in the dialogue structure graph : Then this sentence-level representation is spliced ​​with its corresponding speaker representation (speaker embedding) to form its initial node embedding: vi 0 v_i^0vi0

    Finally, use RGAT to update the node representation :
    ( v 1 , . . . , v M ) = RGAT ( ( v 1 0 , . . . , v M 0 ) , G ) (v_1,...,v_M) = RGAT((v_1^0,...,v_M^0),G)(v1,...,vM)=RG A T (( v10,...,vM0),G)

  • Recall score for utterances : For dialogue recall, we encode the context H ctx H_{ctx}HctxAs an initial dialogue history representation, then define a recall score α vi \alpha_{v_i}aviAs the utterance X i X_iXiImportance measure during recall:
    α vi = σ ( ( W vqhctx ) T ( W vkvi ) ) \alpha_{v_i} = \sigma((W_v^qh_{ctx})^T(W_v^kv_i))avi=s (( Wvqhctx)T(Wvkvi))
    herehctx h_{ctx}hctxIt is H ctx H_{ctx} after BERT encoding of all spliced ​​dialogue historyHctxObtained after mean-pooled.

    Here, like the dot product attention mechanism, the context is used as a query, and the graph representation result of a single sentence (node) is used as a key to calculate the attention score.

  • The final structural encoding of the utterance : Then, the final X i X_iXiThe structural encoding of is obtained by sentence encoding and graph node encoding and recall score weighting:

h s t c , i = α v i ( h i + v i ) h_{stc,i} = \alpha_{v_i}(h_i+v_i) hstc,i=avi(hi+vi)

  • Structural encoding of the dialogue history : Finally, the structural representations of all sentences are spliced ​​together to obtain the structural encoding of the dialogue: H stc H_{stc}Hstc

    To summarize, how do you get the structural encoding of the conversation history? First, encode each sentence in the dialogue history to obtain a sentence-level representation; then update the representation of the nodes in the dialogue graph to obtain a node-level representation; then, calculate the recall score of each sentence, and query the context of the dialogue history , the node representation is used as Key, and then weighted to obtain the structural representation of the discourse. Finally, the structural encoding of each sentence is concatenated to obtain the structural encoding of the dialogue history.

3.2 Recall-Enhanced Generator (REG)

motivation

In the base model, the generative model first performs self-attention, generates a decoding state at each time step, and then uses the context to represent H ctx H_{ctx}Hctxand knowledge representation H k H_kHkDo cross-attention. In the process of training reply generation, it is usually difficult for this kind of model to model long dialogue history and pay attention to the key information in it.

Generate recall information

This paper proposes REG to explicitly generate the key information RR before generating the replyR R R R is a concise summary containing key information about the conversation history. GenerateRRAfter R , the model will generate subsequent replies:
yt = REG ( H ctx , H k , H stc , [ R ; yy < t ] ) y_t = REG(H_{ctx},H_k,H_{stc},[R; y_{y<t}])yt=REG(Hctx,Hk,Hstc,[R;yy<t])
During training,RRR is automatically generated by the medical pre-training modelPCL-MedBERTas a supervisory signal for training the model to recall key information. During inference, MedPIR first generates the recall information, and then generates the reply, which is the supervisory signal detached from the PCL-MedBERT module. This approach has two advantages:

  • pre-generated RRR provides shortcuts for generators to access key historical information through self-attention.

  • The cross-attention mechanism is enhanced to focus on the key information provided by the encoder. (I don’t understand, it’s because RRRepresentations generated by R via self-attention, in the next step via cross-attention and encoder interaction? )

    As shown in the figure, the generator first generates recall information RRR , followed by a separator [RSEP], note that we encodeH k using knowledge H_kHkThe average pooling of the network is used as the embedding of the separator to drive the knowledge fusion in the generation process .

insert image description here

information aggregation

To aggregate different types of information from the encoder , after self-attention (SA) and LN modules, an aggregation module is introduced: Fusion( ), which is a gating mechanism that connects the context encoding H ctx H_{ctx}Hctx, structure coding H stc H_{stc}Hstc, knowledge coding H k H_kHk

First, the output of SA and LN h S , tl h_{S,t}^lhS,tlAs Query, three types of information are used as Key for cross-attention. Then, a gating score is obtained through a linear layer and sigmoid, followed by softmax normalization of the three scores, and finally weighted summation:

insert image description here
insert image description here

When finally generated:

insert image description here

During information recall and response generation, the gated aggregation network dynamically controls the information flow of context encoding, structure encoding, and knowledge encoding . Structural coding provides a complementary information to contextual coding , facilitating the generation of key information recall.

train

Due to the lack of medical dialogue summarization corpus, this paper uses PCL-MedBERT for extractive summarization, and selects several dialogue history utterances most relevant to the target reply as training signals. PCL-MedBERT encodes each utterance and reply, and then calculates their cosine similarity :

insert image description here

Then, select the K utterances with the highest similarity scores and stitch them together as the target recall information . Although this is a relatively vague supervisory signal, the extracted summary utterance usually contains the key information needed to generate an informative medical reply, as shown in the following figure:

insert image description here

In order to further facilitate the model to generate eligible RRs during inferenceR , which facilitates the identification of key utterancesby introducing asupervised recall scores

insert image description here

where ri ∈ { 0 , 1 } r_i \in \{0,1\}ri{ 0,1 } , indicating whether the key utterance belongs to the recall. The higher the recall score, the higher the importance of the recall corresponding to the corresponding utterance.

The final optimization objective, recall information and reply generation are two independent sub-losses:

insert image description here
insert image description here

External knowledge and retrievers

The research work of MedDG shows that predicting the medical entities that may appear in the reply is very helpful to generate informative medical entities . Therefore, this paper trains our own knowledge retrieval model to retrieve the medical entities that may appear in the reply.

First, with the medical entities appearing in the dialogue history as the central nodes, subgraphs with one-hop relations are selected in CMeKG. Then, we retrieve only the entities in the subgraph. We adopt two independent PCL-MedBERT to encode the dialogue history XX separatelyX and and EntityEEE (contains several tokens), geth X h_XhXSum h E h_EhEThen take [CLS] as the encoder output. The inner product of the two is the retrieval score of the entity. Let E i + E_i^+Ei+is the positive entity appearing in the target reply , { E j − } j = 1 n \{E^-_j\}^n_{j=1}{ Ej}j=1nfor nnFor n negative entities that do not appear, optimize the following loss to train the retriever:

insert image description here

We retrieve the top-20 entities in the dialogue history, and then use another PCL-MedBERT as a knowledge encoder to sort the retrieved entities according to the retrieval scores, and then use [SEP] to splice them into a sequence and send them to PCL-MedBERT Encode to get H k H_kHk

Guess you like

Origin blog.csdn.net/m0_47779101/article/details/131580957