[Deep Learning] (9) Mixed Domain Attention Mechanism (DANet, CBAM) in CNN, complete with Tensorflow code

Hello everyone, today I will share with you how to use Tensorflow to build a DANet and CBAM hybrid domain attention mechanism model. In the previous article, I introduced the channel attention mechanism SENet and ECANet in CNN. If you are interested, you can take a look: https://blog.csdn.net/dgvv4/article/details/123572065


1. Introduction to Attention Mechanism

The attention mechanism is essentially a resource allocation mechanism, which can change the resource allocation method according to the importance of the attention target, so that the resources are more inclined to the attention object . In a convolutional neural network, the resource allocated by the attention mechanism is the weight parameter. In the model training process, assigning more weight parameters to the attention object can improve the feature extraction ability of the attention object . Adding an attention mechanism to the target detection task can improve the representation ability of the model, effectively reduce the disturbance of invalid targets , improve the detection effect of the target of interest, and then improve the overall detection accuracy of the model.


2. CBAM attention mechanism

2.1 Method introduction

CBAM attention mechanism is composed of channel attention mechanism (channel) and spatial attention mechanism (spatial).

Advantages of CNAM attention mechanism:

(1) High degree of lightweight: CBAM module does not have a large number of convolution structures, a small number of pooling layers and feature fusion operations. This structure avoids a lot of calculations brought about by convolution multiplication, making its module complexity low and the amount of calculation small. . Experiments show that adding a CBAM module to a lightweight model can bring stable performance improvement . Compared with the small increase in computational complexity, the introduction of CBAM is very cost-effective.

(2) Strong versatility: Its structural characteristics determine the strong versatility and high portability of CBAM , which are mainly reflected in two aspects: on the one hand, the CBAM module based on the pooling operation can be directly embedded after the convolution operation , which means This module can be added to traditional neural networks such as VGG, but also to networks containing residual structures based on shortcut connections, such as ResNet50, MobileNetV3; on the other hand, CBAM is suitable for both target detection and classification tasks , And for datasets with different data characteristics, better performance can be achieved in detection or classification accuracy.

(3) Good effect: The traditional attention mechanism based on convolutional neural network focuses more on the analysis of the channel domain, and is limited to considering the relationship between feature map channels. Starting from the two scopes of channel and spatial, CBAM introduces two analysis dimensions, spatial attention and channel attention, to realize the sequential attention structure from channel to space. Spatial attention can make the neural network pay more attention to the pixel area in the image that plays a decisive role in classification and ignore the irrelevant area . Channel attention is used to deal with the distribution relationship of feature map channels, and at the same time, the attention distribution of the two dimensions is enhanced. The attention mechanism can improve the performance of the model.


2.2 Network structure

(1) Channel attention mechanism

The flow chart of the channel attention mechanism module in CBAM is as follows. First , the input feature map is subjected to global maximum pooling and global average pooling respectively, and the feature map is compressed based on two dimensions to obtain two feature descriptions of different dimensions. The pooled feature maps share a multi-layer perceptron network, which is first reduced by 1*1 convolution and then increased by 1*1 convolution. Stack the two feature maps with layers.add(), and normalize the weight of each channel of the feature map through the sigmoid activation function . Multiply the normalized weights and the input feature map.

Code display

#(1)通道注意力
def channel_attenstion(inputs, ratio=0.25):
    '''ratio代表第一个全连接层下降通道数的倍数'''

    channel = inputs.shape[-1]  # 获取输入特征图的通道数

    # 分别对输出特征图进行全局最大池化和全局平均池化
    # [h,w,c]==>[None,c]
    x_max = layers.GlobalMaxPooling2D()(inputs)
    x_avg = layers.GlobalAveragePooling2D()(inputs)

    # [None,c]==>[1,1,c]
    x_max = layers.Reshape([1,1,-1])(x_max)  # -1代表自动寻找通道维度的大小
    x_avg = layers.Reshape([1,1,-1])(x_avg)  # 也可以用变量channel代替-1

    # 第一个全连接层通道数下降1/4, [1,1,c]==>[1,1,c//4]
    x_max = layers.Dense(channel*ratio)(x_max)
    x_avg = layers.Dense(channel*ratio)(x_avg)

    # relu激活函数
    x_max = layers.Activation('relu')(x_max)
    x_avg = layers.Activation('relu')(x_avg)

    # 第二个全连接层上升通道数, [1,1,c//4]==>[1,1,c]
    x_max = layers.Dense(channel)(x_max)
    x_avg = layers.Dense(channel)(x_avg)

    # 结果在相叠加 [1,1,c]+[1,1,c]==>[1,1,c]
    x = layers.Add()([x_max, x_avg])

    # 经过sigmoid归一化权重
    x = tf.nn.sigmoid(x)

    # 输入特征图和权重向量相乘,给每个通道赋予权重
    x = layers.Multiply()([inputs, x])  # [h,w,c]*[1,1,c]==>[h,w,c]

    return x

(2) Spatial attention mechanism

The spatial attention mechanism module in CBAM is as follows. The output feature maps of the channel attention mechanism are processed in the spatial domain . First, the feature maps are respectively subjected to max pooling and average pooling based on the channel dimension, and the two output feature maps are stacked in the channel dimension layers.concatenate() . Then use 1*1 convolution to adjust the number of channels, and finally normalize the weights through the sigmoid function . Multiply the normalized weight and the input feature degree.

Code display

#(2)空间注意力机制
def spatial_attention(inputs):

    # 在通道维度上做最大池化和平均池化[b,h,w,c]==>[b,h,w,1]
    # keepdims=Fale那么[b,h,w,c]==>[b,h,w]
    x_max = tf.reduce_max(inputs, axis=3, keepdims=True)  # 在通道维度求最大值
    x_avg = tf.reduce_mean(inputs, axis=3, keepdims=True)  # axis也可以为-1

    # 在通道维度上堆叠[b,h,w,2]
    x = layers.concatenate([x_max, x_avg])

    # 1*1卷积调整通道[b,h,w,1]
    x = layers.Conv2D(filters=1, kernel_size=(1,1), strides=1, padding='same')(x)

    # sigmoid函数权重归一化
    x = tf.nn.sigmoid(x)

    # 输入特征图和权重相乘
    x = layers.Multiply()([inputs, x])

    return x

(3) Overall process

The overall flow chart of CBAM is as follows. The input feature map wants to go through the channel attention mechanism, multiply the weight and the input feature map and then send it to the spatial attention mechanism, and multiply the normalized weight and the input feature map of the spatial attention mechanism to obtain the final feature map.

Complete code display

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers, Model

#(1)通道注意力
def channel_attenstion(inputs, ratio=0.25):
    '''ratio代表第一个全连接层下降通道数的倍数'''

    channel = inputs.shape[-1]  # 获取输入特征图的通道数

    # 分别对输出特征图进行全局最大池化和全局平均池化
    # [h,w,c]==>[None,c]
    x_max = layers.GlobalMaxPooling2D()(inputs)
    x_avg = layers.GlobalAveragePooling2D()(inputs)

    # [None,c]==>[1,1,c]
    x_max = layers.Reshape([1,1,-1])(x_max)  # -1代表自动寻找通道维度的大小
    x_avg = layers.Reshape([1,1,-1])(x_avg)  # 也可以用变量channel代替-1

    # 第一个全连接层通道数下降1/4, [1,1,c]==>[1,1,c//4]
    x_max = layers.Dense(channel*ratio)(x_max)
    x_avg = layers.Dense(channel*ratio)(x_avg)

    # relu激活函数
    x_max = layers.Activation('relu')(x_max)
    x_avg = layers.Activation('relu')(x_avg)

    # 第二个全连接层上升通道数, [1,1,c//4]==>[1,1,c]
    x_max = layers.Dense(channel)(x_max)
    x_avg = layers.Dense(channel)(x_avg)

    # 结果在相叠加 [1,1,c]+[1,1,c]==>[1,1,c]
    x = layers.Add()([x_max, x_avg])

    # 经过sigmoid归一化权重
    x = tf.nn.sigmoid(x)

    # 输入特征图和权重向量相乘,给每个通道赋予权重
    x = layers.Multiply()([inputs, x])  # [h,w,c]*[1,1,c]==>[h,w,c]

    return x

#(2)空间注意力机制
def spatial_attention(inputs):

    # 在通道维度上做最大池化和平均池化[b,h,w,c]==>[b,h,w,1]
    # keepdims=Fale那么[b,h,w,c]==>[b,h,w]
    x_max = tf.reduce_max(inputs, axis=3, keepdims=True)  # 在通道维度求最大值
    x_avg = tf.reduce_mean(inputs, axis=3, keepdims=True)  # axis也可以为-1

    # 在通道维度上堆叠[b,h,w,2]
    x = layers.concatenate([x_max, x_avg])

    # 1*1卷积调整通道[b,h,w,1]
    x = layers.Conv2D(filters=1, kernel_size=(1,1), strides=1, padding='same')(x)

    # sigmoid函数权重归一化
    x = tf.nn.sigmoid(x)

    # 输入特征图和权重相乘
    x = layers.Multiply()([inputs, x])

    return x

#(3)CBAM注意力
def CBAM_attention(inputs):

    # 先经过通道注意力再经过空间注意力
    x = channel_attenstion(inputs)
    x = spatial_attention(x)
    return x

#(4)构建模型结构
if __name__ == '__main__':
    
    # 构建输入层
    inputs = keras.Input(shape=[26,26,512])
    # CBAM注意力机制
    x = CBAM_attention(inputs)
    # 构建模型
    model = Model(inputs, x)
    # 查看模型结构
    model.summary()

The parameters are as follows

Total params: 263,427
Trainable params: 263,427
Non-trainable params: 0

3. DANet attention mechanism

The DANet attention mechanism is composed of a position attention mechanism (position) and a channel attention mechanism (channel).

The location attention mechanism is responsible for capturing the spatial dependencies of feature maps at any two locations , and similar features are related to each other regardless of the distance. The channel attention mechanism is responsible for integrating relevant features among all channel maps to selectively emphasize channel maps that have interdependencies .


 3.1 Location Attention Mechanism

The flow chart of the location attention mechanism is as follows

(1) The input feature map A (C×H×W) firstly obtains 3 feature maps B, C, D through 3 convolutional layers, and then reshapes B, C, D into C×N, where N=H ×W

(2) Then multiply the transpose (NxC) of the reshaped feature map B with the reshaped feature map C(CxN) matrix tf.multul() , and then obtain the normalized weight S(N× N)

(3) Then perform matrix multiplication tf.multul( ) between the reshaped feature map D(CxN) and the transpose (NxN) of the weight S , multiply by the scale coefficient α, and then reshape to the original shape, where α is initialized 0, and gradually learn to get larger weights

(4) Finally, superimpose layers.add() with the input feature map A to get the final output E

Code display

# 位置注意力
def position_attention(inputs):
    # 定义可训练变量,反向传播可更新
    gama = tf.Variable(tf.ones(1))  # 初始化1

    # 获取输入特征图的shape
    b, h, w, c = inputs.shape

    # 深度可分离卷积[b,h,w,c]==>[b,h,w,c//8]
    x1 = layers.SeparableConv2D(filters=c//8, kernel_size=(1,1), strides=1, padding='same')(inputs)
    # 调整维度排序[b,h,w,c//8]==>[b,c//8,h,w]
    x1_trans = tf.transpose(x1, perm=[0,3,1,2])
    # 重塑特征图尺寸[b,c//8,h,w]==>[b,c//8,h*w]
    x1_trans_reshape = tf.reshape(x1_trans, shape=[-1,c//8,h*w])
    # 调整维度排序[b,c//8,h*w]==>[b,h*w,c//8]
    x1_trans_reshape_trans = tf.transpose(x1_trans_reshape, perm=[0,2,1])
    # 矩阵相乘
    x1_mutmul = x1_trans_reshape_trans @ x1_trans_reshape
    # 经过softmax归一化权重
    x1_mutmul = tf.nn.softmax(x1_mutmul)

    # 深度可分离卷积[b,h,w,c]==>[b,h,w,c]
    x2 = layers.SeparableConv2D(filters=c, kernel_size=(1,1), strides=1, padding='same')(inputs)
    # 调整维度排序[b,h,w,c]==>[b,c,h,w]
    x2_trans = tf.transpose(x2, perm=[0,3,1,2])
    # 重塑尺寸[b,c,h,w]==>[b,c,h*w]
    x2_trans_reshape = tf.reshape(x2_trans, shape=[-1,c,h*w])

    # 调整x1_mutmul的轴,和x2矩阵相乘
    x1_mutmul_trans = tf.transpose(x1_mutmul, perm=[0,2,1])
    x2_mutmul = x2_trans_reshape @ x1_mutmul_trans

    # 重塑尺寸[b,c,h*w]==>[b,c,h,w]
    x2_mutmul = tf.reshape(x2_mutmul, shape=[-1,c,h,w])
    # 轴变换[b,c,h,w]==>[b,h,w,c]
    x2_mutmul = tf.transpose(x2_mutmul, perm=[0,2,3,1])
    # 结果乘以可训练变量
    x2_mutmul = x2_mutmul * gama

    # 输入和输出叠加
    x = layers.add([x2_mutmul, inputs])
    return x

3.2 Channel Attention Module

The flow chart of the channel attention module is as follows.

(1) Reshape (CxN) and reshape and transpose (NxC) of feature map A respectively;

(2) Multiply the obtained two feature map matrices tf.multul() , and then obtain the normalized weight X(C×C) through softmax ;

(3) Then perform matrix multiplication tf.multul() on the transpose (CxC) of the weight X and the reshaped feature map A (CxN ) , multiply by the scale coefficient β, and reshape it to the original shape. where β is initialized to 0, and gradually learns to get larger weights

(4) Finally, superimpose with the input feature map A to obtain the final output feature map E

Code display

# 通道注意力
def channel_attention(inputs):
    # 定义可训练变量,反向传播可更新
    gama = tf.Variable(tf.ones(1))  # 初始化1

    # 获取输入特征图的shape
    b, h, w, c = inputs.shape

    # 重新排序维度[b,h,w,c]==>[b,c,h,w]
    x = tf.transpose(inputs, perm=[0,3,1,2])  # perm代表重新排序的轴
    # 重塑特征图尺寸[b,c,h,w]==>[b,c,h*w]
    x_reshape = tf.reshape(x, shape=[-1,c,h*w])

    # 重新排序维度[b,c,h*w]==>[b,h*w,c]
    x_reshape_trans = tf.transpose(x_reshape, perm=[0,2,1])  # 指定需要交换的轴
    # 矩阵相乘
    x_mutmul = x_reshape_trans @ x_reshape
    # 经过softmax归一化权重
    x_mutmul = tf.nn.softmax(x_mutmul)

    # reshape后的特征图与归一化权重矩阵相乘[b,x,h*w]
    x = x_reshape @ x_mutmul
    # 重塑形状[b,c,h*w]==>[b,c,h,w]
    x = tf.reshape(x, shape=[-1,c,h,w])
    # 重新排序维度[b,c,h,w]==>[b,h,w,c]
    x = tf.transpose(x, perm=[0,2,3,1])
    # 结果乘以可训练变量
    x = x * gama

    # 输入和输出特征图叠加
    x = layers.add([x, inputs])

    return x

3.3 Overall Process

The overall flow chart of DANet is as follows. The input image passes through the position attention mechanism and the channel attention mechanism, respectively, and superimposes the output feature map with layers.add() to obtain the output feature map.

Complete code display

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers, Model

#(1)通道注意力
def channel_attention(inputs):
    # 定义可训练变量,反向传播可更新
    gama = tf.Variable(tf.ones(1))  # 初始化1

    # 获取输入特征图的shape
    b, h, w, c = inputs.shape

    # 重新排序维度[b,h,w,c]==>[b,c,h,w]
    x = tf.transpose(inputs, perm=[0,3,1,2])  # perm代表重新排序的轴
    # 重塑特征图尺寸[b,c,h,w]==>[b,c,h*w]
    x_reshape = tf.reshape(x, shape=[-1,c,h*w])

    # 重新排序维度[b,c,h*w]==>[b,h*w,c]
    x_reshape_trans = tf.transpose(x_reshape, perm=[0,2,1])  # 指定需要交换的轴
    # 矩阵相乘
    x_mutmul = x_reshape_trans @ x_reshape
    # 经过softmax归一化权重
    x_mutmul = tf.nn.softmax(x_mutmul)

    # reshape后的特征图与归一化权重矩阵相乘[b,x,h*w]
    x = x_reshape @ x_mutmul
    # 重塑形状[b,c,h*w]==>[b,c,h,w]
    x = tf.reshape(x, shape=[-1,c,h,w])
    # 重新排序维度[b,c,h,w]==>[b,h,w,c]
    x = tf.transpose(x, perm=[0,2,3,1])
    # 结果乘以可训练变量
    x = x * gama

    # 输入和输出特征图叠加
    x = layers.add([x, inputs])

    return x

#(2)位置注意力
def position_attention(inputs):
    # 定义可训练变量,反向传播可更新
    gama = tf.Variable(tf.ones(1))  # 初始化1

    # 获取输入特征图的shape
    b, h, w, c = inputs.shape

    # 深度可分离卷积[b,h,w,c]==>[b,h,w,c//8]
    x1 = layers.SeparableConv2D(filters=c//8, kernel_size=(1,1), strides=1, padding='same')(inputs)
    # 调整维度排序[b,h,w,c//8]==>[b,c//8,h,w]
    x1_trans = tf.transpose(x1, perm=[0,3,1,2])
    # 重塑特征图尺寸[b,c//8,h,w]==>[b,c//8,h*w]
    x1_trans_reshape = tf.reshape(x1_trans, shape=[-1,c//8,h*w])
    # 调整维度排序[b,c//8,h*w]==>[b,h*w,c//8]
    x1_trans_reshape_trans = tf.transpose(x1_trans_reshape, perm=[0,2,1])
    # 矩阵相乘
    x1_mutmul = x1_trans_reshape_trans @ x1_trans_reshape
    # 经过softmax归一化权重
    x1_mutmul = tf.nn.softmax(x1_mutmul)

    # 深度可分离卷积[b,h,w,c]==>[b,h,w,c]
    x2 = layers.SeparableConv2D(filters=c, kernel_size=(1,1), strides=1, padding='same')(inputs)
    # 调整维度排序[b,h,w,c]==>[b,c,h,w]
    x2_trans = tf.transpose(x2, perm=[0,3,1,2])
    # 重塑尺寸[b,c,h,w]==>[b,c,h*w]
    x2_trans_reshape = tf.reshape(x2_trans, shape=[-1,c,h*w])

    # 调整x1_mutmul的轴,和x2矩阵相乘
    x1_mutmul_trans = tf.transpose(x1_mutmul, perm=[0,2,1])
    x2_mutmul = x2_trans_reshape @ x1_mutmul_trans

    # 重塑尺寸[b,c,h*w]==>[b,c,h,w]
    x2_mutmul = tf.reshape(x2_mutmul, shape=[-1,c,h,w])
    # 轴变换[b,c,h,w]==>[b,h,w,c]
    x2_mutmul = tf.transpose(x2_mutmul, perm=[0,2,3,1])
    # 结果乘以可训练变量
    x2_mutmul = x2_mutmul * gama

    # 输入和输出叠加
    x = layers.add([x2_mutmul, inputs])
    return x

#(3)DANet网络架构
def danet(inputs):

    # 输入分为两个分支
    x1 = channel_attention(inputs)  # 通道注意力
    x2 = position_attention(inputs)  # 位置注意力

    # 叠加两个注意力的结果
    x = layers.add([x1,x2])
    return x

# 构建网络
if __name__ == '__main__':

    # 构造输入层
    inputs = keras.Input(shape=[26,26,512])
    # 经过DANet注意力机制返回结果
    outputs = danet(inputs)

    # 构造模型
    model = Model(inputs, outputs)
    # 查看模型结构
    model.summary()

View network parameters

Total params: 296,512
Trainable params: 296,512
Non-trainable params: 0

Guess you like

Origin blog.csdn.net/dgvv4/article/details/123888724