Interpretation of Pytorch image processing attention mechanism SENet CBAM ECA module

Table of contents

1. Attention mechanism

1.1 SENet(Squeeze-and-Excitation Network)

1.1.1 Principle of SENet

 1.1.2 SENet code example

1.2 CBAM(Convolutional Block Attention Module)

1.2.1 Principle of CBAM

 1.2.2 CBAM code example

1.3 ECA(Efficient Channel Attention)

1.3.1 Principle of ECA

 1.3.2 ECA code example


 

1. Attention mechanism

The attention mechanism was originally proposed to solve problems in natural language processing (NLP) tasks, which enables the model to dynamically focus on information at different locations when processing sequence data. Subsequently, the attention mechanism was introduced into image processing tasks, providing more flexible and effective information extraction capabilities for deep learning models. The core idea of ​​the attention mechanism is to dynamically adjust the attention of the model according to different parts of the input data, so as to pay more attention to the information useful to the current task.

In image processing, attention mechanisms are widely used in tasks such as image classification, object detection, and image segmentation. By introducing an attention mechanism, the model is able to assign different weights to different positions or channels of the image, so as to better capture important information in the image. In the image classification task, the attention mechanism can help the model focus on the classification-related regions in the image; in the target detection task, the attention mechanism can help the model better locate and detect the target; in the image segmentation task, the attention mechanism Can help the model segment objects more accurately.

1.1 SENet(Squeeze-and-Excitation Network)

SENet (Squeeze-and-Excitation Network) is a channel-based attention mechanism designed to learn channel weights to enhance the importance of each channel in a feature map. It was proposed by Jie Hu et al. in the 2017 paper "Squeeze-and-Excitation Networks".

1.1.1 Principle of SENet

SENet achieves channel attention through the following two steps:

  1. Squeeze: For each channel, calculate its global average pooling to obtain a channel feature value. This is equivalent to compressing the spatial information of each channel.

  2. Excitation: Channel weights are learned using a fully connected layer that includes a sigmoid activation function to generate a channel attention vector. This attention vector represents the importance of each channel.

Finally, the learned channel attention vector is multiplied by the original feature map to obtain a weighted feature map that enhances the importance of each channel.

watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dlaXhpbl80NDc5MTk2NA==,size_16,color_FFFFFF,t_70#pic_center

 1.1.2 SENet code example

import torch
import torch.nn as nn

class SEBlock(nn.Module):
    def __init__(self, in_channels, reduction=16):
        super(SEBlock, self).__init__()
        self.avg_pool = nn.AdaptiveAvgPool2d(1)
        self.fc = nn.Sequential(
            nn.Linear(in_channels, in_channels // reduction),
            nn.ReLU(inplace=True),
            nn.Linear(in_channels // reduction, in_channels),
            nn.Sigmoid()
        )

    def forward(self, x):
        b, c, _, _ = x.size()
        y = self.avg_pool(x).view(b, c)
        y = self.fc(y).view(b, c, 1, 1)
        return x * y

In the above code, we defined a SEBlock class, which is the basic building block of SENet. SEBlock implements channel attention through a global average pooling layer and two fully connected layers. When using SEBlock, insert it into the model where enhanced channel attention is desired.

Then the previous blog used the transfer learning Resnet50 module to classify cats and dogs, and then we can add the SE attention mechanism.

 Pytorch migration learning uses Resnet50 for model training to predict cat and dog classification - Programmer Sought

We first created a new ResNet-50 model, and then added a SEAttention module after each convolutional block, thus implementing SENet's channel attention mechanism. Finally, we adapt the fully connected layer to the new number of classes. Next, we can define hyperparameters, data transformations, and load the dataset as before. Then, create a model, optimizer, and loss function, and train and test. Here is the key code for adding attention mechanism:

class ResNetSE(nn.Module):
    def __init__(self, num_classes, reduction=16):
        super(ResNetSE, self).__init__()
        self.resnet = resnet50(pretrained=True)
        in_channels = self.resnet.fc.in_features
        self.resnet.fc = nn.Linear(in_channels, num_classes)
        self.se1 = SEAttention(256, reduction)
        self.se2 = SEAttention(512, reduction)
        self.se3 = SEAttention(1024, reduction)
        self.se4 = SEAttention(2048, reduction)

    def forward(self, x):
        x = self.resnet.conv1(x)
        x = self.resnet.bn1(x)
        x = self.resnet.relu(x)
        x = self.resnet.maxpool(x)
        x = self.resnet.layer1(x)
        x = self.se1(x)
        x = self.resnet.layer2(x)
        x = self.se2(x)
        x = self.resnet.layer3(x)
        x = self.se3(x)
        x = self.resnet.layer4(x)
        x = self.se4(x)
        x = self.resnet.avgpool(x)
        x = torch.flatten(x, 1)
        x = self.resnet.fc(x)
        return x

1.2 CBAM(Convolutional Block Attention Module)

CBAM (Convolutional Block Attention Module) is a channel and space-based attention mechanism, which combines SENet's channel attention and Spatial Attention mechanism. CBAM was proposed by Sanghyun Woo et al. in the 2018 paper "CBAM: Convolutional Block Attention Module for Visual Attention". CBAM combines the channel attention mechanism and the spatial attention mechanism, which can achieve better results than SENet's attention mechanism that only focuses on channels. The schematic diagram of its implementation is shown below. CBAM will process the channel attention mechanism and the spatial attention mechanism for the input feature layer.

1.2.1 Principle of CBAM

The following figure shows the specific implementation of channel attention mechanism and spatial attention mechanism:

watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dlaXhpbl80NDc5MTk2NA==,size_16,color_FFFFFF,t_70#pic_center

 1.  The upper part of the image is the channel attention mechanism. The implementation of the channel attention mechanism can be divided into two parts. We will perform global average pooling and global maximum pooling on the input single feature layer. Afterwards, the results of average pooling and maximum pooling are processed using the shared fully connected layer. We add the two processed results and then take a sigmoid. At this time, we obtain each channel of the input feature layer The weight (between 0-1). After obtaining this weight, we can multiply this weight by the original input feature layer.

 

2.  The lower part of the image is the spatial attention mechanism. We will take the maximum value and average value on the channel of each feature point for the input feature layer. Afterwards, these two results are stacked, and the number of channels is adjusted by a convolution with a channel number of 1, and then a sigmoid is taken. At this time, we obtain the weight of each feature point of the input feature layer (between 0-1) . After obtaining this weight, we can multiply this weight by the original input feature layer.

 1.2.2 CBAM code example

import torch
import torch.nn as nn

class ChannelAttention(nn.Module):
    def __init__(self, in_channels, reduction=16):
        super(ChannelAttention, self).__init__()
        self.avg_pool = nn.AdaptiveAvgPool2d(1)
        self.max_pool = nn.AdaptiveMaxPool2d(1)
        self.fc = nn.Sequential(
            nn.Linear(in_channels, in_channels // reduction),
            nn.ReLU(inplace=True),
            nn.Linear(in_channels // reduction, in_channels),
            nn.Sigmoid()
        )

    def forward(self, x):
        b, c, _, _ = x.size()
        avg_y = self.avg_pool(x).view(b, c)
        max_y = self.max_pool(x).view(b, c)
        y = self.fc(avg_y + max_y).view(b, c, 1, 1)
        return x * y

class SpatialAttention(nn.Module):
    def __init__(self, kernel_size=7):
        super(SpatialAttention, self).__init__()
        assert kernel_size in (3, 7), "kernel size must be 3 or 7"
        padding = 3 if kernel_size == 7 else 1
        self.conv = nn.Conv2d(2, 1, kernel_size, padding=padding, bias=False)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        avg_out = torch.mean(x, dim=1, keepdim=True)
        max_out, _ = torch.max(x, dim=1, keepdim=True)
        x = torch.cat([avg_out, max_out], dim=1)
        x = self.conv(x)
        return x * self.sigmoid(x)

class CBAMBlock(nn.Module):
    def __init__(self, in_channels, reduction=16, kernel_size=7):
        super(CBAMBlock, self).__init__()
        self.channel_attention = ChannelAttention(in_channels, reduction)
        self.spatial_attention = SpatialAttention(kernel_size)

    def forward(self, x):
        x = self.channel_attention(x)
        x = self.spatial_attention(x)
        return x

1.3 ECA(Efficient Channel Attention)

ECA (Efficient Channel Attention) is a lightweight channel attention mechanism, which proposes to learn channel attention through a 1D convolutional layer to reduce computational complexity. ECA was proposed by Zhang et al. in the 2019 paper "ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks".

1.3.1 Principle of ECA

ECA adopts a more efficient method to learn channel attention. It uses a 1D convolutional layer to convolve the features of each channel, and then uses a sigmoid function to learn channel attention. This can greatly reduce computational complexity and improve model performance to a certain extent.

 

fafd743efc8ca8320eae1c0d0dc88758.png

 

The SE attention mechanism first performs channel compression on the input feature map, and such compression and dimensionality reduction have an adverse effect on learning the dependencies between channels. Based on this concept, the ECA attention mechanism avoids dimensionality reduction and uses 1-dimensional convolution to efficiently It realizes local cross-channel interaction and extracts dependencies between channels. Specific steps are as follows:

① Perform global average pooling operation on the input feature map;

② Perform a 1-dimensional convolution operation with a convolution kernel size of k, and obtain the weight w of each channel through the Sigmoid activation function, as shown in the formula below;

 

9a64ae708c973a79fc62aa67383f9048.png

③ Multiply the weight with the corresponding element of the original input feature map to obtain the final output feature map.

 1.3.2 ECA code example

class eca_block(nn.Module):
    def __init__(self, channel, b=1, gamma=2):
        super(eca_block, self).__init__()
        kernel_size = int(abs((math.log(channel, 2) + b) / gamma))
        kernel_size = kernel_size if kernel_size % 2 else kernel_size + 1
        
        self.avg_pool = nn.AdaptiveAvgPool2d(1)
        self.conv = nn.Conv1d(1, 1, kernel_size=kernel_size, padding=(kernel_size - 1) // 2, bias=False) 
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        y = self.avg_pool(x)
        y = self.conv(y.squeeze(-1).transpose(-1, -2)).transpose(-1, -2).unsqueeze(-1)
        y = self.sigmoid(y)
        return x * y.expand_as(x)

That's the end of this article.

 

 

Guess you like

Origin blog.csdn.net/qq_43649937/article/details/131936841