Rethinking Local-Global Contextual Interaction: Application of SegNetr to Medical Image Segmentation

guide

论文:《SegNetr: Rethinking the local-global interactions and skip connections in U-shaped networks》

Today mainly introduces a SegNetrlightweight medical image segmentation network called "Medical Image Segmentation Network", which has been rethought and improved for the local-global interaction and long jump connection operation in the traditional codec network.

As we all know, in the field of medical image segmentation, the U-Net type of network has basically become the mainstream. However, the author believes that the existing U-shaped segmentation network still has the following problems:

  1. Focus on designing complex self-attention modules to compensate for the lack of capturing long-distance context dependencies based on convolution operations, thereby increasing the parameters and computational complexity of the network;
  2. It is too simple to fuse the features of encoder and decoder, ignoring the correlation of spatial positions between them.

To address the above issues, the paper introduces a novel block that can dynamically perform local-global interactionsSegNetr at arbitrary stages with linear complexity . At the same time, the paper designs a general information-preserving skip connection , which is used to preserve the spatial position information of the encoder features and accurately fuse them with the decoder features.

Finally, this paper verifies the effectiveness of the proposed method on four mainstream medical image segmentation datasets. Compared with the traditional ones, the U-Netparameters and computational complexity of SegNetr are reduced by 59% and 76% respectively. Segmentation performance comparable to state-of-the-art methods. It is worth noting that the method proposed in this paper is also a plug-and-play component, which can be easily applied to any codec network to further improve the segmentation performance of the model.

method

The method in this paper is shown in Figure 1. It can be seen that SegNetrit is a typical hierarchical U-shaped network, which includes SegNetrblock and IRSCtwo important components. In order to make the network more lightweight, the author bases on MBConvas the basic convolutional building block. The SegNetr block enables dynamic local-global interactions at the encoder and decoder stages. Use patch merging to reduce resolution by a factor of two without losing original image information. In addition, IRSC is used to fuse the features of the encoder and decoder to reduce the detail information lost by the network as the depth increases.

MBConv and Fused-MBConv in EfficientNetV2

SegNetr

First, let's take a look at the SegNetr block, which is the core component of the entire SegNetr network, enabling dynamic processing of features through local-global interactions. It uses MBConv as the base convolution module, and introduces local and global branches to achieve interaction.

How to achieve "local" and "global" context capture?

Here, local interactions are achieved by computing the attention matrix of non-overlapping small patches during local branching. The global branch achieves global interaction through aggregation and displacement operations on spatially discontinuous patches. The local and global branches are finally fused by weighted summation. This design not only reduces the computational complexity, but also better captures the local and global information in the image.

CIHR

The information-preserving skip connections realize the fusion of encoder and decoder features through Patch Mergingand . Patch ReverseAmong them, Patch Mergingthe specific operation reduces the resolution of the input feature map, and at the same time expands the channel dimension to retain more high-resolution details. While Patch Reverseis used to recover the spatial resolution of the encoder and fuse with the upsampled features of the decoder. This can better recover the details and location information of the feature maps and improve the accuracy of segmentation.

experiment

:::block-1

首先,在ISIC2017数据集上,SegNetr和TransUNet的IoU达到了最高值(0.775),比基准U-Net高出3.9%。即使是参数更少的SegNetr-S也能够获得与UNeXt-L相似的分割性能。在PH2数据集上,我们观察到基于Transformer的方法Swin-UNet的分割性能最差,这与目标数据集的数据量直接相关。而本文方法在该数据集上获得了最佳的分割性能,并保持了较低的计算开销。虽然该方法使用了基于窗口位移的注意力方法,但卷积神经网络具有更好的归纳偏差,因此与Swin-UNet或TransUNet等基于Transformer的方法相比,对数据量的依赖性较小。 :::

:::block-1

在表格2中,作者将SegNetr的IoU和Dice与双编码器FATNet进行了比较,结果显示SegNetr的IoU和Dice分别比FATNet高出1.6%和0.8%,而GFLOPs则减少了32.65。在ACDC数据集中,左心室的分割相对较容易,U-Net的IoU为0.861,但比SegNetr差1.1%。心肌位于左右心室之间,呈环状模式,所提方法的IoU比专注于边界分割的EANet高0.6%。此外,我们观察到四个网络UNeXt、UNeXt-L、SegNetr-S和SegNetr的分割性能,发现更小的参数可能限制了网络的学习能力。 :::

:::block-1

如图所示,可以看出,SegNetr 能够在较少的数据情况下准确描述皮肤病变,并实现多类别分割,最小化欠分割和过分割的情况。

:::

总结

SegNetr 通过引入 SegNetr 块和信息保留跳跃连接来改进 U 型网络的分割性能。其中,SegNetr 块通过局部-全局交互实现更好的特征表示,而信息保留跳跃连接则提供了更好的特征融合机制。这些方法使得 SegNetr 在减少计算复杂度的同时,能够获得与传统方法相媲美甚至更好的分割性能。

写在最后

If you have children's shoes who are interested in the research of deep learning in the field of medical image applications, you are very welcome to scan the QR code at the bottom of the screen or directly search the WeChat account cv_huber to add editor friends, notes: school/company-research direction-nickname, and more Friends exchange and learn together!

Guess you like

Origin juejin.im/post/7266336495031271460