Dual Attention Network for Scene Segmentation attention of two modules

Theoretical Description

Other methods of deficiencies :

  1. The method of using a multilayer wherein before fusion, LSTM, graph to obtain a low efficiency characteristics dependent manner

  2. If the context is embedded have explored the (convolution layer is through a lot of it), it is important, prominent feature of object features will affect the inconspicuous object, thus affecting identification

    In the process of convolution, pooled in an inconspicuous feature gradually replaced by a conspicuous feature, so if you want to start to start in place of the beginning?

DANet advantages :

  1. The authors' approach can be selective similar feature fusion humble objects, so this feature is more obvious, in order to avoid the effects of conspicuous objects

    Power Objects humble individual is small, and may not be prominent, but after all the characteristics of a weighted sum of all the obscure object features will become conspicuous.
    However, the salient features of those objects will be more conspicuous ah? Environmental impact of such words seem fades, because of the similarity of objects less
    different features are on a different channel, each channel tube some features, plus the attention of these features will be more obvious?

  2. Different scales required for detecting the same, similar features fusion any scale from the overall perspective of adaptive methods

    On the network does not have a similar multi-scale stuff, just use two modules to get the attention of a good result in the end do not know why the


Method Description

Here Insert Picture Description

Attention of these two modules added to the scene segmentation processing network, network flow as :

  1. Pictures RseNet go through treatment in the last two ResNet block, removing a down-sampling, convolution using empty, so you can retain more details, and no additional parameters
  2. These characteristics are then sent to the attention of the two parallel processing modules, the contents of two modules attention fusion, to further improve the representation of features, to obtain more accurate results
  3. Characterized FCN fused into the expansion, the end of the entire network.

两个注意力模块分别在位置通道上起作用,注意力模块的操作分三步进行:

  1. 生成一个通道(位置)注意力矩阵来给特征的任意两个通道(位置)的空间关系建模
  2. 然后让原来的特征和注意力矩阵做矩阵乘法
  3. 残差(加上原本的特征x)

通道可以看作特定类的响应。

注意力模组公式

设定输入到注意力模组的特征是 A R C × H × W A \in R^{C\times H \times W}

x j i = e x p ( A i A j ) i = 1 C e x p ( A i A j ) x_{ji} = \frac{exp(A_i \cdot A_j)}{\sum^C_{i=1} exp(A_i \cdot A_j)}

X R C × C X \in R^{C \times C} ,存放通道之间的相关系数

E j = b i = 1 C ( x j i A i ) + A j E_j = \beta \sum^C_{i=1}(x_{ji}A_i) + A_j

之后对两个attention做卷积,然后再逐元素相加。


代码实现

和non-local略有不同:

  1. non-local的 θ , ϕ , g \theta,\phi,g 都将通道减小到了一般,然后再通过 W W 来改变通道数,使得和输入的通道数相同
  2. DANet中 θ , ϕ \theta,\phi 的通道数都会减小,而 g g 中的通道数没有减小,这样不需要用 W W 就能和输入的通道是一样,同时这里也把 W W 换成了一个可学习的系数 γ \gamma
import torch
from torch import nn


class PAM(nn.Module):
    def __init__(self,input_dim):
        super(PAM,self).__init__()
        self.input_dim = input_dim
        self.inter_dim = input_dim // 8
        if self.inter_dim == 0:
            self.inter_dim = 1

        self.theta = nn.Conv2d(self.input_dim,self.inter_dim,kernel_size=1)
        self.phi = nn.Conv2d(self.input_dim,self.inter_dim,kernel_size=1)
        self.g = nn.Conv2d(self.input_dim,self.input_dim,kernel_size=1)
        self.gamma = nn.Parameter(torch.zeros(1))
    def forward(self, x):
        batch, channel, H, W = x.size()
        theta_x = self.theta(x).view(batch, H*W, -1)
        phi_x = self.phi(x).view(batch, -1, H*W)
        f = torch.bmm(theta_x, phi_x)
        g = self.g(x).view(batch, H*W, -1)
        attention = torch.softmax(f,dim=-1)
        y = torch.bmm(attention,g).view(batch,-1,H,W)
        z = self.gamma * y + x
        return z


class CAM(nn.Module):
    def __init__(self,input_dim):
        super(CAM,self).__init__()
        self.input_dim = input_dim
        self.inter_dim = input_dim // 8
        if self.inter_dim == 0:
            self.inter_dim = 1

        self.theta = nn.Conv2d(self.input_dim,self.inter_dim,kernel_size=1)
        self.phi = nn.Conv2d(self.input_dim,self.inter_dim,kernel_size=1)
        self.g = nn.Conv2d(self.input_dim,self.inter_dim,kernel_size=1)
        self.gamma = nn.Parameter(torch.zeros(1))
    def forward(self, x):
        batch, channel, H, W = x.size()
        theta_x = self.theta(x).view(batch, H*W, -1)
        phi_x = self.phi(x).view(batch, -1, H*W)
        f = torch.bmm(theta_x, phi_x)
        # 下面这一句不知道有什么用,加上了耗时更久
        f = torch.max(f,dim=-1,keepdim=True)[0].expand_as(f) - f
        g = self.g(x).view(batch, H*W, -1)
        attention = torch.softmax(f,dim=-1)
        y = torch.bmm(attention,g).view(batch,-1,H,W)
        z = self.gamma * y + x
        return z

class DANetHead(nn.Module):
    def __init__(self, in_channel, out_channel):
        super(DANetHead, self).__init__()
        self.in_channel = in_channel
        self.out_channel = out_channel
        self.inter_channel = in_channel // 4
        if self.inter_channel == 0:
            self.inter_channel = 1

        self.conv_res_p = nn.Sequential(nn.Conv2d(self.in_channel, self.inter_channel, kernel_size=3, padding=1, bias=False),
                                        nn.BatchNorm2d(self.inter_channel),
                                        nn.ReLU())
        self.conv_res_c = nn.Sequential(nn.Conv2d(self.in_channel, self.inter_channel, kernel_size=3, padding=1, bias=False),
                                        nn.BatchNorm2d(self.inter_channel),
                                        nn.ReLU())

        self.PA = PAM(self.inter_channel)
        self.CA = CAM(self.inter_channel)

        self.conv_before_sum_p = nn.Sequential(nn.Conv2d(self.inter_channel,self.inter_channel, kernel_size=3, padding=1, bias=False),
                                             nn.BatchNorm2d(self.inter_channel),
                                             nn.ReLU())
        self.conv_before_sum_c = nn.Sequential(nn.Conv2d(self.inter_channel,self.inter_channel, kernel_size=3, padding=1, bias=False),
                                             nn.BatchNorm2d(self.inter_channel),
                                             nn.ReLU())

        self.conv_after_sum_p = nn.Sequential(nn.Dropout2d(0.1, False),
                                              nn.Conv2d(self.inter_channel, self.out_channel, kernel_size=1, bias=False))
        self.conv_after_sum_c = nn.Sequential(nn.Dropout2d(0.1, False),
                                              nn.Conv2d(self.inter_channel, self.out_channel, kernel_size=1, bias=False))
        self.conv_sum_p_c = nn.Sequential(nn.Dropout2d(0.1, False),
                                              nn.Conv2d(self.inter_channel, self.out_channel, kernel_size=1, bias=False))

    def forward(self, x):
        batch, channel, H, W = x.size()

        feat_4p = self.conv_res_p(x)
        feat_4c = self.conv_res_c(x)

        feat_p = self.PA(feat_4p)
        feat_c = self.CA(feat_4c)

        feat_p_before_sum = self.conv_before_sum_p(feat_p)
        feat_c_before_sum = self.conv_before_sum_c(feat_c)

        feat_sum = feat_p_before_sum + feat_c_before_sum

        p_get_loss = self.conv_after_sum_p(feat_p_before_sum)
        c_get_loss = self.conv_after_sum_c(feat_c_before_sum)
        feat_result = self.conv_sum_p_c(feat_sum)

        output = []
        # 保留前两个算loss用
        output.append(p_get_loss)
        output.append(c_get_loss)
        output.append(feat_result)

        return output


if __name__ == "__main__":
    x = torch.randn((2,8,64,64))
    model = DANetHead(8,1)
    z = model(x)
    print('z size :',z[2].size())
发布了63 篇原创文章 · 获赞 2 · 访问量 8025

Guess you like

Origin blog.csdn.net/McEason/article/details/104168171