SK Attention of attention mechanism

1.SK Attention module

Link: Selective Kernel Networks

2. Model structure diagram:

3. Main content of the paper

Since receptive fields of different sizes have different effects on targets of different scales, the purpose of this paper is to enable the network to automatically use the information captured by receptive fields that are effective for classification. In order to solve this problem, the author proposes a new deep structure dynamic selection mechanism for the convolution kernel in CNN, which allows each neuron to adaptively adjust its receptive field according to the multi-scale input information (convolution kernel )the size of. Called "Selective Kernel", it can better capture the multi-scale features of complex image spaces without wasting a lot of computing resources like a general CNN. Another advantage of SKN is that it can aggregate deep features, making it easier to understand while also allowing better interpretability. The source of inspiration is that when we look at objects of different sizes and different distances, the size of the receptive field of neurons in the visual cortex will be adjusted according to the stimulus. Specifically, a building block called Selective Kernel Unit (SK) is designed, where multiple branches with different kernel sizes are fused using SoftMax guided by the information in these branches. SKNet is composed of multiple SK units, and the neurons in SKNet can capture target objects of different scales.

The SKNet network is mainly divided into three operations: Split, Fuse, and Select. The Split operator generates multiple paths of different kernel sizes. The model in the above figure only designs two convolution kernels of different sizes. In fact, multiple convolution kernels of multiple branches can be designed. The fuse operator combines and aggregates information from multiple paths to obtain a global and comprehensive representation for selecting weights. The select operator aggregates feature maps of kernels of different sizes according to selection weights.

Overall, the advantage of the SKN structure is that it captures the image space more efficiently.

4. Code example

import numpy as np
import torch
from torch import nn
from torch.nn import init
from collections import OrderedDict


# 定义SKAttention类,继承自nn.Module类
class SKAttention(nn.Module):
    
    # 定义初始化函数,channel参数为输入的特征通道数,kernels参数为卷积核的大小列表,reduction参数为降维比例,group参数为卷积组数,L参数为维度
    def __init__(self, channel=512,kernels=[1,3,5,7],reduction=16,group=1,L=32):
        super().__init__()

        # 定义d参数,为L和channel除以reduction中最大值
        self.d=max(L,channel//reduction)

        # 定义一个nn.ModuleList,用于存放卷积层 
        #在输入图像上使用不同大小的卷积核卷积,获得多个不同尺寸的特征图;
        self.convs=nn.ModuleList([])
        for k in kernels:
            self.convs.append(
                nn.Sequential(OrderedDict([   # 定义一个nn.Sequential,包含一个OrderedDict
                    ('conv',nn.Conv2d(channel,channel,kernel_size=k,padding=k//2,groups=group)),
                    ('bn',nn.BatchNorm2d(channel)),
                    ('relu',nn.ReLU())
                ]))
            )
        self.fc=nn.Linear(channel,self.d)
        self.fcs=nn.ModuleList([])
        for i in range(len(kernels)):
            self.fcs.append(nn.Linear(self.d,channel))
        self.softmax=nn.Softmax(dim=0)



    def forward(self, x):
        bs, c, _, _ = x.size()
        conv_outs=[]
        ### 对输入input进行分割。使用kernels参数指定的不同卷积核大小,使用同一组卷积参数,对这个input进行卷积操作,从而得到k个不同特征;
        for conv in self.convs:
            conv_outs.append(conv(x))
        feats=torch.stack(conv_outs,0)#k,bs,channel,h,w

        ### fuse融合层 将这k个特征直接相加,得到融合特征U
        U=sum(conv_outs) #bs,c,h,w

        ### reduction channel 降维层 将这个融合特征U的每一个通道求平均值,得到降维的特征S,然后进行全连接运算
        S=U.mean(-1).mean(-1) #bs,c
        
        Z=self.fc(S) #bs,d

        ### calculate attention weight
        weights=[]

        #对于这个降维后的特征S,使用多个不同的全连接层fcs,
        for fc in self.fcs:
            weight=fc(Z)
            weights.append(weight.view(bs,c,1,1)) #bs,channel
        #把降维后的特征S转换为各个不同卷积核大小特征的权重,
        attention_weughts=torch.stack(weights,0)#k,bs,channel,1,1
        attention_weughts=self.softmax(attention_weughts)#k,bs,channel,1,1

        ### fuse 将权重与特征叠加求和,得到最终的融合特征V
        V=(attention_weughts*feats).sum(0)
        return V


if __name__ == '__main__':
    input=torch.randn(50,512,7,7)
    se = SKAttention(channel=512,reduction=8)
    output=se(input)
    print(output.shape)

Guess you like

Origin blog.csdn.net/qq_38915354/article/details/129721180