【Paper Notes】Knowledge-Driven Encode, Retrieve, Paraphrase for Medical Image Report Generation (AAAI 2019)

Original paper: https://arxiv.org/pdf/1903.10122.pdf

Abstract

  • Knowledge-driven Encode, Retrieve, Paraphrase (KERP) approach Knowledge-driven Encode, Retrieve, Paraphrase (KERP) approach
  • decomposes medical report generation into explicit medical abnormality graph learning and subsequent natural language modeling

visual features --(Encode Module)–> an abnormality graph --(Retireve module)–> sequences of templates --(Paraphrase module)–> sequences of words

  • generates structured and robust reports supported with accurate abnormality prediction
  • produces explainable attentive regions which is crucial for interpretative diagnosis

 

GTR

  • core of KERP

  • dynamically transforms high-level semantics between graph-structured data of multiple domains such as knowledge graphs, images and sequences

GTR as a module

  1. concatenating intra-graph message passing and inter- graph message passing into one step
    1. conduct message passing within target graph message passing within the target graph
    2. conduct message passing from one / multiple source graph pass message from one / multiple source graph
  2. stacking multiple such steps into one module stacking multiple such steps into one module
    • convert target graph features into high-level semantics Convert target graph features into high-level semantics

insert image description here

GTR as a multiple domains

insert image description here

  • GTR i2g \text{GTR}_\text{i2g} GTRi2g: image features --> graph’s features
  • GTR g2s \text{GTR}_\text{g2s} GTRg2s: input - graph; output - sequence
  • GTR g2g \text{GTR}_\text{g2g} GTRg2g: a graph --> another graph
    • abnormality graph --> disease graph
  • GTR gs2s \text{GTR}_\text{gs2s} GTRgs2s: input - graph&sequence; output - sequence

GTR for sequential input/output

positional encoding – relative and absolute position information positional encoding – relative and absolute position information

 

KERP

Encode module

  • transforms visual features into a structured abnormality graph by incorporating prior medical knowledge
  • each node represents a possible clinical abnormality

updated node features:
h u = G T R i 2 g ( X ) u = s i g m o i d ( W u h u ) \text{h}_u = GTR_{i2g}(\text{X}) \\ u = sigmoid(\text{W}_u\text{h}_u) hu=GTRi2g(X)u=sigmoid(Wuhu)

  • W u \text{W}_u Wu: linear projection to transform latent feature u into 1-d probability linear projection transforms latent feature u into one-dimensional probability
  • h u = ( h u 1 ; h u 2 ; . . . ; h u N ) ∈ R N , d \text{h}_u=(\text{h}_{u_1};\text{h}_{u_2};...;\text{h}_{u_N}) \in R^{N,d} hu=(hu1;hu2;...;huN)RN , d : the set of latent features of nodeswheredisfeature dimension
  • u = ( u 1 , u 2 , . . . , u N ) , y i ∈ { 0 , 1 } , i ∈ { 1 , . . . , N } \text{u}=(u_1,u_2,...,u_N),y_i\in \{0,1\}, i\in\{1,...,N\} u=(u1,u2,...,uN),yi{ 0,1},i{ 1,...,N } :binary label for abnormality nodes Binary label for abnormality nodes

Retrieve module Retrieve

  • retrieves text templates based on the detected abnormalities retrieves text templates based on the detected abnormalities

obtain template sequence:
h t = G T R g 2 s ( h u ) t = argmax S o f t m a x ( W t h t ) \text{h}_t = GTR_{g2s}(\text{h}_u) \\ t = \text{argmax}Softmax(\text{W}_t\text{h}_t) ht=GTRg2s(hu)t=argmaxSoftmax(Wtht)

  • W t \text{W}_t Wt: linear projection to transform latent feature to template embedding linear projection transforms latent features into template embeddings

Paraphrase module

  • refine templates with enriched details and possibly new case-specific findings
    • by modifying information in the templates that is not accurate for specific cases
  • convert templates into more natural and dynamic expressions convert templates into more natural and dynamic expressions
    • by robust language modeling for the same content by robust language modeling for the same content

h w = G T R g s 2 s ( h u , t ) R = argmax S o f t m a x ( W w f ( h w ) ) \text{h}_w = GTR_{gs2s}(\text{h}_u,t) \\ R = \text{argmax}Softmax(\text{W}_wf(\text{h}_w)) hw=GTRg s 2 s(hu,t)R=argmaxSoftmax(Wwf(hw))

  • f f f: the operation of reshaping h w \text{h}_w hw from R N s , N w , d R^{N_s,N_w,d} RNs,Nw,d to R N s ∗ N w , d R^{N_s*N_w,d} RNsNw,d
  • W w \text{W}_w Ww: linear projection to transform latent feature into word embedding linear projection transforms latent features into word embedding

Disease classification

multi-label disease classification: 多标记疾病分类
h z = G T R g 2 g ( h u ) z = s i g m o i d ( W z h z ) \text{h}_z = GTR_{g2g}(\text{h}_u) \\ z = sigmoid(\text{W}_z\text{h}_z) hz=GTRg2g(hu)z=sigmoid(Wzhz)
W z \text{W}_z Wz: linear projection to transform disease nodes feature into 1-d probability linear projection transforms disease node features into one-dimensional probability

Learning

During paraphrasing, the retrieved templates t, instead of latent feature h t \text{h}_t ht, is used for rewriting. Sampling the templates of maximum predicted probability breaks the connectivity of differentiable back-propagation of the whole encode retrieve-paraphrase pipeline.

  1. train the Paraphrase with ground truth templates
  2. then with **sampled templates **sampling template generated by Retrieval module

 

Results

insert image description here

Conclusion

  • accurate attributes prediction
  • dynamic medical knowledge graph
  • explainable location reference explainable location reference

Guess you like

Origin blog.csdn.net/Kqp12_27/article/details/124615783