资料

论文地址：https://arxiv.org/pdf/1703.02719.pdf

pytorch代码：pytorch-semantic-segmentation/gcn.py at master · zijundeng/pytorch-semantic-segmentation · GitHub

论文贡献：

1.提出了GCN架构；

2.引入Boundry Refine对分割边界进一步细化达到更优效果；

3.在PASCAL VOC 2012(82.2%) and Cityscapes(76.9%)取得state-of-the-art.

网络模型分析

如果拆解语义分割任务的话，Semantic Segmentation = classification + localization。但识别和定位两个任务的要义是矛盾的，即识别任务要求模型对各种变换不敏感(例如翻转和旋转)，定位任务要求对变换敏感，可以精确分割像素。

GCN设计思想来源

1.从定位的角度来看，模型结构应该是全卷积的以保持定位性能，并且不应该使用全连接或全局池化层，因为这些层会丢弃定位信息；

2.从分类的角度来看，网络架构中应该采用大的内核大小，以实现特征图和逐像素分类器之间的密集连接，从而增强处理不同转换的能力。

如上图所示，先前工作中即使感受野和输入图片一样大(A)，但如果图片调整到更大的尺寸的话感受野就无法覆盖到整只鸟身上(B). 论文中提出的GCN可以满足更大的尺寸缩放，目标仍在感受野之内(C)。

整体网络结构图

模型使用ResNet作为特征提取层，使用FCN的结构作为分割架构。不同尺度的特征是从不同大小的feature map上提取的，对于不同层级的feature map使用GCN提取全局信息，同时高层次的feature map通过上采样来补充语义信息，最终融合得到预测图。这里提出了一个残差结构(residual structure)的BR模块(上图F2.C)学习边界信息。

GCN架构

如果直接使用大核卷积，必然会带来计算复杂度的增加，但作者为了平衡性能与计算量，采用了非对称卷积（InceptionV2中的思想），达到了大核卷积的感受野，同时也降低了计算复杂度。

代码：

# many are borrowed from https://github.com/ycszen/pytorch-ss/blob/master/gcn.py
class _GlobalConvModule(nn.Module):
    def __init__(self, in_dim, out_dim, kernel_size):
        super(_GlobalConvModule, self).__init__()
        pad0 = (kernel_size[0] - 1) / 2
        pad1 = (kernel_size[1] - 1) / 2
        # kernel size had better be odd number so as to avoid alignment error
        super(_GlobalConvModule, self).__init__()
        self.conv_l1 = nn.Conv2d(in_dim, out_dim, kernel_size=(kernel_size[0], 1),
                                 padding=(pad0, 0)) # 左kx1卷积
        self.conv_l2 = nn.Conv2d(out_dim, out_dim, kernel_size=(1, kernel_size[1]),
                                 padding=(0, pad1)) # 左1xk卷积
        self.conv_r1 = nn.Conv2d(in_dim, out_dim, kernel_size=(1, kernel_size[1]),
                                 padding=(0, pad1)) # 右1xk卷积
        self.conv_r2 = nn.Conv2d(out_dim, out_dim, kernel_size=(kernel_size[0], 1),
                                 padding=(pad0, 0)) # 右kx1卷积

    def forward(self, x):
        x_l = self.conv_l1(x)
        x_l = self.conv_l2(x_l)
        x_r = self.conv_r1(x)
        x_r = self.conv_r2(x_r)
        x = x_l + x_r   # sum操作
        return x

Boundary Refinement结构

网络使用了较多的Boundary Refinement结构，这是一种残差连接的结构，具体结构如下图所示，顶部w × h × 21是粗糙score map,侧边的残差连接可以对boundary进行refine,两者相加达到了Boundary Refine的效果。

代码：

class _BoundaryRefineModule(nn.Module):
    def __init__(self, dim):
        super(_BoundaryRefineModule, self).__init__()
        self.relu = nn.ReLU(inplace=True)
        self.conv1 = nn.Conv2d(dim, dim, kernel_size=3, padding=1)  # 分支3x3卷积
        self.conv2 = nn.Conv2d(dim, dim, kernel_size=3, padding=1)  # 分支3x3卷积

    def forward(self, x):
        residual = self.conv1(x)
        residual = self.relu(residual)  # Conv + ReLU
        residual = self.conv2(residual) # Conv
        out = x + residual  # sum操作
        return out

讨论

欢迎加群讨论

Reference

1.Semantic Segmentation --Improve Semantic Segmentation by Global Convolutional Network（GCN）论文解读_DFan的NoteBook-CSDN博客

2.图像分割之Global Convolutional Network(GCN)_Mao_Jonah的博客-CSDN博客

Global Convolutional Network(GCN)网络模型

资料