Top medical journal MedIA丨YoloCureSeg: weakly labeled curve structure segmentation algorithm using a single noise skeleton

Click on the card below to follow the " CVer " public account

AI/CV heavy-duty information, delivered as soon as possible

Click to enter—> [Medical Imaging and Transformer] Exchange Group

Reply in the background of CVer WeChat public account: YoloCureSeg, you can download the pdf and code of this paper

Authors of this article丨Lin Li, Tang Xiaoying

5d8c7c759f05f691373ea675fd0ad1f7.png

01

Introduction

The International Conference on Medical Imaging and Computer-Aided Diagnosis (MICAD) today introduces to you the latest paper published in MedIA (Medical Image Analysis) magazine by the team of Professor Tang Xiaoying from Southern University of Science and Technology and the team of Professor Kenneth KY Wong from the University of Hong Kong . YoloCurvSeg: You only label one noisy skeleton for vessel-style curvilinear structure segmentation” .

Weakly-supervised learning (WSL) uses sparse granularity (i.e., points, boxes, graffiti) supervision to train semantic segmentation networks, which can greatly alleviate the contradiction between data annotation cost and model performance, and has been used in images Segmentation domains show good performance. However, due to limited supervision signals, this remains a very challenging task, especially when only a few labeled samples are available. In addition, almost all existing WSL segmentation methods are designed for star-shaped structures (such as organs, etc.), which are very different from curved structures such as blood vessels and nerves. This paper proposes a new sparsely labeled curve structure segmentation framework YoloCurvSeg.

YoloCurvSeg is an image synthesis-based framework. Specifically, the background generator in the framework extracts an image background that is very similar to the real distribution by inpainting the dilated noisy skeleton annotations. The extracted background image is then combined with the simulated curve structure randomly generated by the foreground generator based on the spatial colonization algorithm, and the foreground and background are fused through a multi-layer patch-wise comparative learning synthesizer. With this approach, a synthetic dataset with both image and curve structure segmentation labels can be obtained at an annotation cost of only one or a few noisy skeleton annotations. Finally, the segmenter is trained using the generated dataset and a possible unlabeled real image dataset. The method proposed in this paper is evaluated on four publicly available datasets (OCTA500, CORN, DRIVE and CHASEDB1), and the results show that YoloCurvSeg significantly outperforms the state-of-the-art WSL segmentation methods. Using only one noise skeleton annotation (the number of labeled pixels is 0.14%, 0.03%, 1.40% and 0.65% of the fully supervised labels in the original dataset), YoloCurvSeg achieved more than 97% full supervision on each dataset. performance. The code and dataset have been gradually made public at https://github.com/llmir/YoloCurvSeg.

29e14cff47c60aa01096c1d8529eb75d.png

Figure 1. YoloCurvSeg achieves over 97% fully supervised performance on four representative datasets using only one noisy skeleton annotation, which means doctors can significantly save labeling time and still obtain satisfactory segmentation results. .

02

research content

Curvilinear structures are slender, curved, multi-scale structures, usually in the shape of trees, commonly found in natural images (such as cracks and airway maps) and biomedical images (such as blood vessels, nerves, and cell membranes). Automatic and accurate segmentation of these curvilinear structures is of great significance in both computer vision and biomedical image analysis. For example, road mapping is a prerequisite for autonomous driving and urban planning. In the biomedical field, studies have shown that the morphology and topology of specific curvilinear anatomical structures (such as retinal blood vessels and corneal nerve fibers) are highly correlated with the presence or severity of various diseases, such as hypertension, arteriosclerosis, keratitis, age Related macular degeneration, diabetic retinopathy, etc. Thanks to the development of deep learning technology, many works have been proposed to complexly design the model architecture or add additional topological constraint losses, but they are basically under a fully supervised paradigm and require large-scale well-annotated data sets. However, collecting and labeling large-scale datasets with complete annotations is very expensive and time-consuming, especially for medical images, since their annotation requires expert knowledge and clinical experience. In addition, the annotation of curved structures is more challenging because curved structures are slender, multi-scale, complex in shape, and fine in detail. The time cost of annotating a single sample curved structure is often several times or even dozens of times that of ordinary organs or structures.

Recently, researchers have made a lot of efforts to reduce the annotation cost of deep learning model training. For example, semi-supervised learning (SSL) trains a model by combining a limited amount of annotated data with a large amount of unlabeled data. Although effective, most state-of-the-art SSL methods still require approximately 5%-30% accurate and precisely labeled data to achieve approximately 85%-95% fully supervised performance, which is still not cost-effective when labeling curvilinear structures, and Still time consuming. Weakly supervised learning (WSL) attempts to alleviate the labeling problem from another perspective by performing sparse granularity (i.e. points, scribbles, bounding boxes) supervision and achieves good performance. But the vast majority of WSL methods still require sparse labeling of the entire dataset (or a large part), and they are designed and validated on relatively simple structures (e.g., cardiac structures or abdominal organs), with assumptions and priors May not work with complex structures (e.g., curved structures).

To solve the above problems, we propose a new WSL segmentation framework for curve structures, namely YoloCurvSeg. For curvilinear structures, label noise/errors are inevitable and a good segmentation method should be noise tolerant. Therefore, YoloCurvSeg does not only use annotated pixels for supervision, but cleverly transforms weak supervision problems into fully supervised or semi-supervised problems through image synthesis. It uses a trained inpainting network as a background generator, selects one (or more) noise skeletons based on availability, and inflates it as an inpainting mask to obtain a background that closely matches the real distribution. The extracted background is then enhanced and combined with random simulation curves generated by a foreground generator based on the spatial colonization algorithm, from which a synthetic dataset is obtained through a multi-layer block-by-block contrastive learning synthesizer. Finally, the segmenter performs a two-stage coarse-to-fine segmentation using synthetic datasets and unlabeled datasets (if available). Its main frame is shown in Figure 2:

dcd5a751ae3e92eab6c5c63fda6ad501.png

Figure 2. Top: Our proposed YoloCurvSeg framework, which consists of four main components: a curve generator based on the spatial colonization algorithm, a background generator, a synthesizer based on multi-layer block-by-block contrast foreground-background fusion, and a two-stage roughening to the fine divider. Bottom: Details of the curve generator and the curve generation process for the four data sets used.

Experiments on multiple medical image data sets of different modalities show that the method proposed in this work has the following advantages:

(1) The image synthesis performance is excellent and it can synthesize a variety of synthetic data sets that are aligned with the distribution of real curve-structured medical image data sets. A representative sample is visualized in Figure 3. We also compared the intensity (histogram) distribution of the synthetic dataset with the real dataset, as shown in Figure 4. The synthetic dataset has high intensity similarity with the real image in terms of background and foreground. From the t-SNE visualization in Figure 5, the synthetic dataset is generally consistent with the real dataset and blends well.

2ef293d95bea3feded525d756e18b9c4.png

Figure 3. Visualization of YoloCurvSeg synthetic data. From left to right are the original image superimposed with noisy skeleton labels, the inflated inpainting mask, the extracted background, the generated foreground, the synthesized image, and the foreground generated by superimposing the synthetic image.

f6eab536cc63075229b007d228b719a6.png

Figure 4. Histograms of four data sets, including real data (top) and corresponding synthetic data (bottom).

478ab0b4614e5bf4d8905b6dabb664ed.png

Figure 5. t-SNE visualization of 4 real and synthetic datasets. CORN good and CORN poor represent the high-quality image subset and low-quality subset of CORN respectively.

(2) A substantial performance lead is achieved in comparison with mainstream and state-of-the-art weakly supervised segmentation methods and noise label learning methods. Regardless of whether it labels the entire data set or a single sample, YoloCurvSeg achieves usable, nearly fully supervised performance, and is relatively suboptimal in multiple indicators (including DSC, ASSD, etc.) on the four data sets. Significant improvements have been achieved, and the quantitative results and visualization results are shown in Table 3-5 and Figure 6 respectively.

3365a89c47d8d9f57d61a866024af7fc.png

Table 3. Comparison with existing WSL methods on OCTA500 and CORN datasets. The best results are highlighted in bold and the next best results are highlighted in underline. FS stands for fully supervised learning.

6108cf9681431ded307dadf51d5bb93f.png

Table 4. Comparison with existing WSL methods on DRIVE and CHASEDB1 datasets. The best results are highlighted in bold and the next best results are highlighted in underline. FS stands for fully supervised learning.

0d59ea8ae9195e2268cbfaf695c9453f.png

Table 5. Comparison with noisy label learning methods on OCTA500 and DRIVE. M and S represent the full mask and noise skeleton respectively. FS stands for fully supervised learning. The best results are highlighted in bold and the next best results are highlighted in underline.

81ce9688bdffa116f88379a088aaf7e5.png

Figure 6. Qualitative visualization comparing representative results of our coarse segmenter (without additional training on unlabeled real data) and other SOTA WSL methods in a single annotation setting.

(3) The proposed framework is noise robust, sample insensitive and easily extended to various other curve structures. To verify the robustness of YoloCurvSeg to selected single sparsely labeled samples, we randomly select 10 samples from each dataset and compare their performance with a fully supervised model trained on the same samples. As shown in Figure 7, YoloCurvSeg outperforms fully supervised in almost all cases and provides highly stable performance decoupled from image/annotation quality, despite the large fluctuations in the performance of fully supervised models. In addition to robustness, YoloCurvSeg's predictions also have smaller variance. Both aspects show that YoloCurvSeg is sample insensitive and can reduce the risk of selecting the wrong sample for labeling.

In order to study the impact of the integrity of the noisy skeleton on segmentation performance, we performed partial erasure analysis experiments on the skeleton. Due to the low contrast between small blood vessels and the background in fundus images, fundus images are likely to have missing/missing annotations. Therefore, we select two samples from the DRIVE dataset and erase the noisy skeleton labels of some small blood vessels, as shown in Figure 8. Specifically, we deleted 12.55% and 9.66% of the labeled areas on samples No. 25 and No. 38, respectively. From the figure, we can clearly see the erased areas on the noise skeleton and the impact on the extracted background image and synthetic image. On the two samples with complete noise skeleton annotations, the performance indicators (DSC, ASSD) of the segmentation model are (77.99, 1.71) and (77.74, 1.59) respectively. After erasing part of the noise skeleton and synthesizing the corresponding new training set, the performance indicators of the segmentation model are (78.06, 1.83) and (78.11, 1.40). The fluctuations before and after erasure are small, indicating the robustness of the proposed method. .

At the same time, to further demonstrate the scalability of the proposed method, we performed additional synthesis and segmentation validation analyzes on the X-ray coronary angiography data set (i.e., DCA1). Both synthesis and segmentation results demonstrate the successful transfer of the method on this dataset. Other similar and potentially transferable scenarios include cell membranes, cracks, roads (aerial images), and leaf vein segmentation.

Finally, to better illustrate the time-saving advantages of our method in real clinical scenarios, we plot the performance of all evaluated methods (including all WSL methods, NLL methods, and YoloCurvSeg) under single-sample and full-sample conditions in Figure 10 Segmentation performance and labeling time cost. Our method achieves the highest segmentation performance (≥97% FS) with the lowest annotation time cost (<0.3% FS) across all four tasks.

fa9a0a558a7d419a40ddb8b2423edd66.png

Figure 7. Performance of YoloCurvSeg (coarse stage) given different samples and their annotations in a single sample setting.

a0569ff71017aac28a448d256a913285.png

Figure 8. Experimental diagram related to missing annotation sensitivity. Visualization of inpainting masks, extracted backgrounds, and synthetic images is performed on two representative samples from the DRIVE dataset with varying degrees of completeness in the noisy skeleton labels. In each noise skeleton, red indicates missing/erased portions and red numbers indicate the missing percentage.

e513a7991ce07bdb10cc5d114c425b2a.png

Figure 9. Visualization of synthetic data from the DCA1 dataset. From left to right are the original image, fully supervised labels, noise skeleton, extracted background and two synthetic sample images.

01b516184bcefda210f464f2ef06d9b7.png

Figure 10. Performance-annotation cost trade-off diagram. Visualization of segmentation accuracy (DSC) and labeling time for all weakly supervised segmentation and noisy label learning methods as well as single- and full-sample fully supervised (FS) settings. The number and type of labels used are indicated in parentheses, and M and S represent fully supervised labels and noise skeletons respectively.

03

About the Author

Brief introduction of the corresponding author

Tang Xiaoying , Assistant Professor, Associate Researcher, Doctoral Supervisor of Southern University of Science and Technology, Shenzhen Overseas High-Level Introduced Talent, Shenzhen Outstanding Youth, National Key R&D Program Project, National Key R&D Program Young Scientist Project, National Natural Science Foundation General Project, Project leader of National Natural Science Foundation of China Youth Project, MICCAI local chairman/area chairman/branch chairman, associate editor of Neural Networks journal, senior member of IEEE, etc. Mainly engaged in research in the fields of medical image analysis, AI-assisted diagnosis, computer vision and other fields. Published 58 SCI journal papers (34 in JCR Area 1), 90 long academic conference papers, and 2 book chapters.

Introduction to the first author

Lin Li is a doctoral candidate jointly trained by the University of Hong Kong and Southern University of Science and Technology. His research interests include medical image processing and analysis, efficient data learning, and federated learning. Under the guidance of Professor Tang Xiaoying and Professor Kenneth KY Wong, he published more than 20 papers in authoritative medical imaging journals MedIA, IEEE TMI and well-known conferences in related fields such as ICCV, MICCAI, MIDL, ISBI, etc.

04

Funding

This work was supported by Shenzhen Basic Research Plan Project JCYJ20190809120205578; National Natural Science Foundation of China 62071210; Shenzhen Science and Technology Plan Project RCYX20210609103056042; Shenzhen Basic Research Plan Project JCYJ20200925153847004; Shenzhen Science and Technology Innovation Commission Project KCXFZ20201221173 40001 funding.

05

Related Links

Github link:

https://github.com/llmir/YoloCurvSeg

Article link: https://www.sciencedirect.com/science/article/abs/pii/S1361841523001974

https://arxiv.org/abs/2212.05566

Reply in the background of CVer WeChat public account: YoloCureSeg, you can download the pdf and code of this paper

Click to enter—> [Medical Imaging and Transformer] Exchange Group

ICCV/CVPR 2023 paper and code download

 
  

Backstage reply: CVPR2023, you can download the collection of CVPR 2023 papers and code open source papers

后台回复:ICCV2023,即可下载ICCV 2023论文和代码开源的论文合集
医疗影像和Transformer交流群成立
扫描下方二维码,或者添加微信:CVer333,即可添加CVer小助手微信,便可申请加入CVer-医疗影像或者Transformer 微信交流群。另外其他垂直方向已涵盖:目标检测、图像分割、目标跟踪、人脸检测&识别、OCR、姿态估计、超分辨率、SLAM、医疗影像、Re-ID、GAN、NAS、深度估计、自动驾驶、强化学习、车道线检测、模型剪枝&压缩、去噪、去雾、去雨、风格迁移、遥感图像、行为识别、视频理解、图像融合、图像检索、论文投稿&交流、PyTorch、TensorFlow和Transformer、NeRF等。
一定要备注:研究方向+地点+学校/公司+昵称(如医疗影像或者Transformer+上海+上交+卡卡),根据格式备注,可更快被通过且邀请进群

▲扫码或加微信号: CVer333,进交流群
CVer计算机视觉(知识星球)来了!想要了解最新最快最好的CV/DL/AI论文速递、优质实战项目、AI行业前沿、从入门到精通学习教程等资料,欢迎扫描下方二维码,加入CVer计算机视觉,已汇集数千人!

▲扫码进星球
 
  
▲点击上方卡片,关注CVer公众号
It’s not easy to organize, please like and watch8e6b5d03866e3a145032a0e57650d3bd.gif

Guess you like

Origin blog.csdn.net/amusi1994/article/details/133004425