Analysis of Common Attention Mechanisms

1.Squeeze-and-Excitation(SE)

The main idea of ​​SE is to improve the performance of the model by compressing and motivating the input features. Specifically, the SE attention mechanism consists of two steps: Squeeze and Excitation. In the Squeeze step, the input feature map is compressed into a vector by a global average pooling operation, and then mapped to a smaller vector by a fully connected layer. In the Excitation step, use a sigmoid function to compress each element in this vector between 0 and 1, and multiply it with the original input feature map to obtain a weighted feature map. Through the SE attention mechanism, the model can adaptively learn the importance of each channel, thereby improving the performance of the model. In practical applications, the SE attention mechanism has been widely used in various deep learning models and achieved good results.

 code show as below:

import torch
from torch import nn
from torchstat import stat  # 查看网络参数

# 定义SE注意力机制的类
class SE_block(nn.Module):
    def __init__(self, channel, ratio=16):
        super(SE_block, self).__init__()
        self.avg_pool = nn.AdaptiveAvgPool2d(1)
        self.fc = nn.Sequential(
                nn.Linear(channel, channel // ratio, bias=False),
                nn.ReLU(inplace=True),
                nn.Linear(channel // ratio, channel, bias=False),
                nn.Sigmoid()
        )

    def forward(self, x):
        b, c, _, _ = x.size()
        y = self.avg_pool(x).view(b, c)
        y = self.fc(y).view(b, c, 1, 1)
        return x * y

if __name__ == '__main__':
    # 构造输入层
    inputs = torch.rand(2,320,32,32)
    # 获取输入通道数
    channel = inputs.shape[1]
    # 模型实例化
    model = SE_block(channel, ratio=16)

    # 前向传播查看输出结果
    outputs = model(inputs)
    print(outputs.shape)  #[2, 320, 32, 32]

    # print(model)  # 查看模型结构
    stat(model, input_size=[320,32,32])  # 查看参数,不需要指定batch维度

2.Convolutional Block Attention Module(CBAM)

CBAM is used to adaptively adjust the focus on key regions in the image. The CBAM attention mechanism consists of two modules: channel attention module and spatial attention module. The channel attention module is used to learn the importance of each channel, making the network pay more attention to those meaningful features while eliminating those meaningless features. The input of this module is the output of a convolutional layer, and the output is the feature map after weight normalization. Specifically, for each channel, the channel attention module uses a global average pooling operation to obtain a feature vector, and then learns the importance of the channel through two fully connected layers, and weights the final convolutional layer output.

The spatial attention module is used to learn the importance of different spatial regions of the image, i.e. attention weights at different locations. The input of this module is the output of the convolutional layer, and the output is the attention matrix of each spatial position generated by the convolution operation and the full connection operation in the spatial dimension. This matrix can adaptively adjust the receptive field size and focus on key regions in the image.

Channel and spatial attention modules can be used in combination to build a CBAM attention mechanism. Through the application of this attention mechanism, the accuracy of tasks such as image classification, object detection and semantic segmentation can be improved.

code show as below:

import torch
from torch import nn
from torchstat import stat  # 查看网络参数

# 定义CBAM注意力机制的类
class ChannelAttention(nn.Module):
    def __init__(self, in_planes, ratio=8):
        super(ChannelAttention, self).__init__()
        self.avg_pool = nn.AdaptiveAvgPool2d(1)
        self.max_pool = nn.AdaptiveMaxPool2d(1)

        # 利用1x1卷积代替全连接
        self.fc1   = nn.Conv2d(in_planes, in_planes // ratio, 1, bias=False)
        self.relu1 = nn.ReLU()
        self.fc2   = nn.Conv2d(in_planes // ratio, in_planes, 1, bias=False)

        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        avg_out = self.fc2(self.relu1(self.fc1(self.avg_pool(x))))
        max_out = self.fc2(self.relu1(self.fc1(self.max_pool(x))))
        out = avg_out + max_out
        return self.sigmoid(out)

class SpatialAttention(nn.Module):
    def __init__(self, kernel_size=7):
        super(SpatialAttention, self).__init__()

        assert kernel_size in (3, 7), 'kernel size must be 3 or 7'
        padding = 3 if kernel_size == 7 else 1
        self.conv1 = nn.Conv2d(2, 1, kernel_size, padding=padding, bias=False)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        avg_out = torch.mean(x, dim=1, keepdim=True)
        max_out, _ = torch.max(x, dim=1, keepdim=True)
        x = torch.cat([avg_out, max_out], dim=1)
        x = self.conv1(x)
        return self.sigmoid(x)

class CBAM_block(nn.Module):
    def __init__(self, channel, ratio=8, kernel_size=7):
        super(CBAM_block, self).__init__()
        self.channelattention = ChannelAttention(channel, ratio=ratio)
        self.spatialattention = SpatialAttention(kernel_size=kernel_size)

    def forward(self, x):
        x = x * self.channelattention(x)
        x = x * self.spatialattention(x)
        return x


if __name__ == '__main__':
    # 构造输入层
    inputs = torch.rand(2,320,32,32)
    # 获取输入通道数
    channel = inputs.shape[1]
    # 模型实例化
    model = CBAM_block(channel, ratio=16, kernel_size=7)

    # 前向传播查看输出结果
    outputs = model(inputs)
    print(outputs.shape)  #[2, 320, 32, 32]

    # print(model)  # 查看模型结构
    stat(model, input_size=[320,32,32])  # 查看参数,不需要指定batch维度

3.Efficient Channel Attention(ECA)

ECA is an attention mechanism model for image processing. It mainly improves the effectiveness of image feature representation by regulating the attention of image channels. Specifically, the ECA attention mechanism model consists of two parts: global average pooling and linear transformation. Global average pooling can aggregate the information of each channel to determine whether the information in the channel is critical; linear transformation can scale and translate the information of the channel, so that the key information is better preserved and the non-key information is suppressed . If the information of a certain channel is not critical to image performance, it can be suppressed to improve the performance of other channels. Compared with other attention mechanism models, the ECA attention mechanism has the advantages of low model complexity, high computational efficiency, and good effect. Therefore, it is widely used in areas such as image classification, object detection, and image segmentation.

 code show as below:

import torch
import math
from torch import nn
from torchstat import stat  # 查看网络参数

# 定义ECA注意力机制的类
class ECA_block(nn.Module):
    def __init__(self, channel, b=1, gamma=2):
        super(ECA_block, self).__init__()
        kernel_size = int(abs((math.log(channel, 2) + b) / gamma))
        kernel_size = kernel_size if kernel_size % 2 else kernel_size + 1

        self.avg_pool = nn.AdaptiveAvgPool2d(1)
        self.conv = nn.Conv1d(1, 1, kernel_size=kernel_size, padding=(kernel_size - 1) // 2, bias=False)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        y = self.avg_pool(x)
        y = self.conv(y.squeeze(-1).transpose(-1, -2)).transpose(-1, -2).unsqueeze(-1)
        y = self.sigmoid(y)
        return x * y.expand_as(x)


if __name__ == '__main__':
    # 构造输入层
    inputs = torch.rand(2,320,32,32)
    # 获取输入通道数
    channel = inputs.shape[1]
    # 模型实例化
    model = ECA_block(channel)

    # 前向传播查看输出结果
    outputs = model(inputs)
    print(outputs.shape)  #[2, 320, 32, 32]

    # print(model)  # 查看模型结构
    stat(model, input_size=[320,32,32])  # 查看参数,不需要指定batch维度

4.Coordinate attention(CA)

CA can avoid the loss of position information caused by the global pooling-2D operation, focus on the two dimensions of width and height, and effectively use the spatial coordinate information of the input feature map. CA is mainly divided into two steps: the first step is the embedding of coordinate information. Given an input X, use the pooling layer to encode each channel along the horizontal and vertical coordinates, respectively, to obtain a pair of direction-aware feature maps. The second step is the generation of the coordinate information feature map. First, the extracted feature information is spliced, and then a 1×1 convolution transformation function is used to convert the information, and then the intermediate feature map is obtained, which is decomposed into two along the spatial dimension. A separate tensor is transformed into a tensor with the same number of channels by using two convolutions, and finally the output results are expanded and used as attention weight assignment values. CA is a simple and flexible plug-and-play module that can improve the accuracy of the network without any additional overhead.

code show as below:

import torch
from torch import nn
from torchstat import stat  # 查看网络参数

# 定义CA注意力机制的类
class CA_Block(nn.Module):
    def __init__(self, channel, reduction=16):
        super(CA_Block, self).__init__()

        self.conv_1x1 = nn.Conv2d(in_channels=channel, out_channels=channel // reduction, kernel_size=1, stride=1,
                                  bias=False)

        self.relu = nn.ReLU()
        self.bn = nn.BatchNorm2d(channel // reduction)

        self.F_h = nn.Conv2d(in_channels=channel // reduction, out_channels=channel, kernel_size=1, stride=1,
                             bias=False)
        self.F_w = nn.Conv2d(in_channels=channel // reduction, out_channels=channel, kernel_size=1, stride=1,
                             bias=False)

        self.sigmoid_h = nn.Sigmoid()
        self.sigmoid_w = nn.Sigmoid()

    def forward(self, x):
        _, _, h, w = x.size()

        x_h = torch.mean(x, dim=3, keepdim=True).permute(0, 1, 3, 2)
        x_w = torch.mean(x, dim=2, keepdim=True)

        x_cat_conv_relu = self.relu(self.bn(self.conv_1x1(torch.cat((x_h, x_w), 3))))

        x_cat_conv_split_h, x_cat_conv_split_w = x_cat_conv_relu.split([h, w], 3)

        s_h = self.sigmoid_h(self.F_h(x_cat_conv_split_h.permute(0, 1, 3, 2)))
        s_w = self.sigmoid_w(self.F_w(x_cat_conv_split_w))

        out = x * s_h.expand_as(x) * s_w.expand_as(x)
        return out

if __name__ == '__main__':
    # 构造输入层
    inputs = torch.rand(2,320,32,32)
    # 获取输入通道数
    channel = inputs.shape[1]
    # 模型实例化
    model = CA_Block(channel, reduction=16)
    
    # 前向传播查看输出结果
    outputs = model(inputs)
    print(outputs.shape)  #[2, 320, 32, 32]

    # print(model)  # 查看模型结构
    stat(model, input_size=[320,32,32])  # 查看参数,不需要指定batch维度

reference

http://t.csdn.cn/0XBy9icon-default.png?t=N3I4http://t.csdn.cn/0XBy9

Guess you like

Origin blog.csdn.net/m0_56247038/article/details/130562792