Medical multimodal review

Original link: https://arxiv.org/abs/2307.07362

A medical multi-modal review. I focused on the segmentation and looked at the segmentation. I didn’t have time to look at the rest of the tasks. I took a model summary diagram. If you want to know more about it, go to the above paper.

Dataset summary

Report generation Report generation

Report generation is designed to automatically generate descriptions from EHR and medical images.
It reduces the workload of clinicians and improves the quality of the reports themselves. Since the training process for report generation typically requires medical images and text reports written by clinicians, it can be naturally viewed as a multimodal learning process.

1)a CNN encoder and hierarchical LSTM decoder

2)Transformer architecture

3)AlignTransformer

4)self-supervised learning techniques, such as CLIP

5) reward mechanisms improve accuracy

Model summary

 Judgment criteria

1. text quality text quality

Refers to the readability, accuracy and validity of the text.

BLEU [19], METEOR [50], and ROUGE-L [51]

2. medical correctness

AUC, precision, recall, F1, RadCliQ

3. explainability explainability, explainability

 factENT, factENTNLI

Visual question answering Visual question answering

Model summary

Cross-modal retrieval Cross-modal information retrieval

Model summary

 

Diagnostic classification Diagnostic classification

Model summary

Semantic segmentation Semantic segmentation

The effectiveness of image-text contrastive learning, which involves using semantic segmentation to extract visual features that can be juxtaposed with textual features, to facilitate understanding of the relationship between an image and its corresponding textual description (Table 6). Furthermore, local alignment evaluation in contrastive learning is evaluated using semantic segmentation techniques.

Image-Text Alignment (Image-Text Alignment) and Local Representation Learning (Local Representation Learning) are commonly used semantic segmentation methods in MDL. These techniques can help improve the accuracy of the model and enable it to better understand the relationship between different regions in the image. The spatial relationship between them and the relationship between visual and textual information [119]

Li et al. [120] proposed LViT to utilize medical text annotations to improve the quality of image data and guide the generation of pseudo-labels for better segmentation performance. Muller et al. [121] designed a novel pre-training method, LoVT , designed to specifically solve localized medical imaging tasks. Their method outperforms commonly used pre-training techniques on 10 out of 18 localization tasks.

 Model summary

data set

SIMI 

The data set includes 12,047 chest radiographs, as well as the corresponding manual annotations

RNSA

The dataset includes 29,700 frontal radiographs for evaluation of evidence of pneumonia

MS-CXR 

It consists of 1153 image-sentence pairs with annotated bounding boxes and corresponding radiologist-verified phrases. This dataset covers eight different cardiopulmonary radiology findings.

Judgment criteria

1)Dice 

2)Miou (mean intersection over union)

3)CNR (contrast-to-noise ratio)

Guess you like

Origin blog.csdn.net/Scabbards_/article/details/131964422