[Deep Learning Attention Mechanism Series] - SCSE Attention Mechanism (with pytorch implementation)

SCSE attention module (from the paper [ 1803.02579] Concurrent Spatial and Channel Squeeze & Excitation in Fully Convolutional Networks (arxiv.org) ). It improves the SE attention module and proposes three module variants of cSE, sSE, and scSE . These modules can enhance meaningful features and suppress useless features . Today we will explain these three attention modules separately.

1. cSE module (SE attention mechanism of channel dimension)

[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture and upload it directly (img-SMHvhcuu-1691560496641) (E:\Study Notes\Deep Learning Notes\Deep Learning Basics\Attention Mechanism\ SCSE.assets\image-20230809131542884.png)]

The cSE module introduces the channel attention mechanism, which can effectively integrate and enhance the feature information of the channel dimension. This is similar to the traditional channel attention mechanism such as SE. The biggest difference is that it reduces the dimensionality of the obtained attention weight . The operation of increasing the dimension is similar to the bottleneck structure in resnet and the last fully connected acceleration layer of the Fast RCNN target detection network. This operation method has the meaning of singular value decomposition, which is very common in deep learning models and can effectively integrate channels. Information, and simplify the module complexity, reduce the amount of model calculations, and improve the calculation speed .

Implementation mechanism :

  • Pass the feature map through the global average pooling layer to change the dimension from [C, H, W] to [C, 1, 1].
  • Then use two 1×1 convolutions to process information (ie, dimensionality reduction and dimensionality enhancement operations), and finally obtain a C-dimensional vector.
  • Then use the sigmoid function for normalization to obtain the corresponding weight vector file.
  • Finally, the original feature map is multiplied by channel-wise to obtain the calibrated feature map through the channel information.

Code implementation :

class CSE(nn.Module):
    def __init__(self, in_channels, reduction=16):
        super(CSE, self).__init__()
        self.cSE = nn.Sequential(
            nn.AdaptiveAvgPool2d(1),
            nn.Conv2d(in_channels, in_channels // reduction, 1),
            nn.ReLU(inplace=True),
            nn.Conv2d(in_channels // reduction, in_channels, 1),
            nn.Sigmoid(),
        )
   
    def forward(self, x):
        return x * self.cSE(x)

2. sSE module (SE attention mechanism of spatial dimension)

[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture and upload it directly (img-xWAJQhNS-1691560496643) (E:\Study Notes\Deep Learning Notes\Deep Learning Basics\Attention Mechanism\ SCSE.assets\image-20230809131552369.png)]

The sSE module carries out information enhancement and integration in the spatial dimension of the feature map. Like the channel dimension, it also extracts the weight information first, and then multiplies the weight information with the original feature map to obtain the attention enhancement effect. However, when extracting the weight information, it is in Spatial dimension expansion is no longer using the global average pooling layer, but using a convolutional layer with an output channel of 1 and a convolution kernel size of 1×1 for information integration .

Here we briefly introduce the role of the 1×1 convolutional layer :

①Change the number of channels (that is, increase dimensionality and reduce dimensionality)

②Information integration (can realize cross-channel information interaction)

Increase nonlinearity (based on singular value decomposition, combined with nonlinear activation function, deepening the model)

Implementation mechanism :

  • Pass the feature map through a convolution layer with an output channel of 1 and a convolution kernel size of 1×1 to obtain a weight matrix with dimensions (1, H, W).
  • Sigmod normalize the weight matrix to get the final weight matrix.
  • Multiply the weight matrix with the original feature map in the spatial dimension to obtain the final spatial information enhanced feature map result.

Code implementation :

class SSE(nn.Module):
    def __init__(self, in_channels):
        super(SSE, self).__init__()
        self.sSE = nn.Sequential(nn.Conv2d(in_channels, 1, 1), nn.Sigmoid())

    def forward(self, x):
        return x * self.sSE(x)

3. scSE module (SE attention mechanism with mixed dimensions)

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-o7I26uui-1691560496643) (E:\Study Notes\Deep Learning Notes\Deep Learning Basics\Attention Mechanism\ SCSE.assets\image-20230809131630998.png)]

The scSE module is a combination of the sSE module and the cSE module, which integrates and enhances the information of the spatial dimension and the channel dimension at the same time, and adds the feature results of the two along the channel dimension (the result is the same as the original feature map dimension).

Implementation mechanism :

  • Pass the feature map through the cSE module to get the feature map result 1.
  • Pass the feature map through the sSE module to get the feature map result 2.
  • Add the feature map results 1 and 2 along the channel dimension to get the final information correction result (the front and rear feature map dimensions remain unchanged).

Code implementation :

class SCSE(nn.Module):
    def __init__(self, in_channels, reduction=16):
        super(SCSE, self).__init__()
        self.cSE = nn.Sequential(
            nn.AdaptiveAvgPool2d(1),
            nn.Conv2d(in_channels, in_channels // reduction, 1),
            nn.ReLU(inplace=True),
            nn.Conv2d(in_channels // reduction, in_channels, 1),
            nn.Sigmoid(),
        )
        self.sSE = nn.Sequential(nn.Conv2d(in_channels, 1, 1), nn.Sigmoid())

    def forward(self, x):
        return x * self.cSE(x) + x * self.sSE(x)

Guess you like

Origin blog.csdn.net/qq_43456016/article/details/132186624