[Deep Learning Attention Mechanism Series] - CBAM Attention Mechanism (with pytorch implementation)

CBAM (Convolutional Block Attention Module) is an attention mechanism module for enhancing the performance of convolutional neural network (CNN). It was proposed by Sanghyun Woo et al. in the 2018 paper [ 1807.06521] CBAM: Convolutional Block Attention Module (arxiv.org) . The main goal of CBAM is to improve the perceptual ability of the model by introducing channel attention and spatial attention in CNN , so as to improve the performance without increasing the complexity of the network.

1 Overview

CBAM aims to overcome the limitations of traditional convolutional neural networks when dealing with information of different scales, shapes and directions. To this end, CBAM introduces two attention mechanisms: channel attention and spatial attention . Channel attention helps to enhance the feature representation of different channels, while spatial attention helps to extract key information at different locations in space.

2. Model structure

CBAM consists of two key components: channel attention module (C-channel) and spatial attention module (S-channel) . These two modules can be respectively embedded into different layers in CNN to enhance feature representation.

2.1 Channel attention module

insert image description here

The goal of the channel attention module is to enhance the feature representation of each channel. Following are the steps to implement the channel attention module:

  1. Global maximum pooling and global average pooling: For the input feature map, first perform global maximum pooling and global average pooling operations on each channel, and calculate the maximum feature value and average feature value on each channel. This produces two vectors containing the number of channels, representing the global maximum and average features for each channel, respectively.

  2. Fully connected layer: The feature vectors after global max pooling and average pooling are input into a shared fully connected layer. This fully connected layer is used to learn attention weights for each channel. Through learning, the network can adaptively decide which channels are more important for the current task. Intersect the global maximum feature vector and the average feature vector to get the final attention weight vector.

  3. Sigmoid activation: To ensure that attention weights lie between 0 and 1, a sigmoid activation function is applied to generate channel attention weights. These weights will be applied to each channel of the original feature map.

  4. Attention weighting: Using the obtained attention weights, multiply them with each channel of the original feature map to obtain the attention-weighted channel feature map. This will emphasize channels that are helpful to the current task and suppress irrelevant channels.

Code implementation :

class ChannelAttention(nn.Module):
    """
    CBAM混合注意力机制的通道注意力
    """

    def __init__(self, in_channels, ratio=16):
        super(ChannelAttention, self).__init__()
        self.avg_pool = nn.AdaptiveAvgPool2d(1)
        self.max_pool = nn.AdaptiveMaxPool2d(1)

        self.fc = nn.Sequential(
            # 全连接层
            # nn.Linear(in_planes, in_planes // ratio, bias=False),
            # nn.ReLU(),
            # nn.Linear(in_planes // ratio, in_planes, bias=False)

            # 利用1x1卷积代替全连接,避免输入必须尺度固定的问题,并减小计算量
            nn.Conv2d(in_channels, in_channels // ratio, 1, bias=False),
            nn.ReLU(inplace=True),
            nn.Conv2d(in_channels // ratio, in_channels, 1, bias=False)
        )

        self.sigmoid = nn.Sigmoid()

        def forward(self, x):
            avg_out = self.fc(self.avg_pool(x))
            max_out = self.fc(self.max_pool(x))
            out = avg_out + max_out
            out = self.sigmoid(out)
            return out * x

2.2 Spatial Attention Module

insert image description here

The goal of the spatial attention module is to emphasize the importance of different locations in an image. Following are the steps to implement the spatial attention module:

  1. Global maximum pooling and global average pooling: For the input feature map, perform global maximum pooling and global average pooling operations respectively to generate features of different context scales.
  2. Connection and convolution: The features after global maximum pooling and global average pooling are connected (spliced) along the channel dimension to obtain a feature map with different scale context information. Then, this feature map is processed through convolutional layers to generate spatial attention weights.
  3. Sigmoid activation: Similar to the channel attention module, a sigmoid activation function is applied to the generated spatial attention weights, constraining the weights between 0 and 1.
  4. Attention weighting: The resulting spatial attention weights are applied to the original feature map, weighting features at each spatial location. This highlights important image areas and reduces the impact of unimportant areas.

Code implementation :

class SpatialAttention(nn.Module):
    """
    CBAM混合注意力机制的空间注意力
    """

    def __init__(self, kernel_size=7):
        super(SpatialAttention, self).__init__()

        assert kernel_size in (3, 7), 'kernel size must be 3 or 7'
        padding = 3 if kernel_size == 7 else 1
        self.conv1 = nn.Conv2d(2, 1, kernel_size, padding=padding, bias=False)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        avg_out = torch.mean(x, dim=1, keepdim=True)
        max_out, _ = torch.max(x, dim=1, keepdim=True)
        out = torch.cat([avg_out, max_out], dim=1)
        out = self.sigmoid(self.conv1(out))
        return out * x

2.3 Hybrid attention module

insert image description here

CBAM is to multiply the output features of the channel attention module and the spatial attention module element by element to obtain the final attention enhancement feature . This enhanced feature will be used as input to subsequent network layers to suppress noise and irrelevant information while preserving key information. The experiment in the original text proves that the integration of the channel dimension first, and then the integration of the space dimension, the effect of the model is better (the feeling of effective metaphysical alchemy).

Code implementation :

class CBAM(nn.Module):
    """
    CBAM混合注意力机制
    """

    def __init__(self, in_channels, ratio=16, kernel_size=3):
        super(CBAM_Block, self).__init__()
        self.channelattention = ChannelAttention(in_channels, ratio=ratio)
        self.spatialattention = SpatialAttention(kernel_size=kernel_size)

    def forward(self, x):
        x = self.channelattention(x)
        x = self.spatialattention(x)
        return x

Summarize

In summary, the CBAM module improves the feature representation of convolutional neural networks by adaptively learning channel and spatial attention weights. By combining channel attention and spatial attention, the CBAM module is able to capture the correlation between features in different dimensions, thus boosting the performance of image recognition tasks.

Guess you like

Origin blog.csdn.net/qq_43456016/article/details/132187911