Attention mechanism CBAM implementation

1. Introduction to CBAM

Paper: CBAM: Convolutional Block Attention
Module
proposes a simple but effective attention module CBAM, given an intermediate feature map, infer the attention weight along the two dimensions of space and channel, and then multiply it with the original feature map Make adaptive adjustments to features. Since CBAM is a lightweight general-purpose module, it can be seamlessly integrated into any CNN architecture with negligible additional overhead, and it can be trained end-to-end together with the basic CNN. On different classification and detection data sets, after integrating CBAM into different models, the performance of the model has been consistently improved, showing its wide applicability.
CBAM can be divided into two parts: channel attention module and spatial attention module, as shown in the figure below.
Insert picture description here

  • Channel attention module: Pay attention to what features are meaningful.
    Input a feature map F as H W C (in fact, there may be batch, namely NHWC). First, perform a global space maximum pooling and average pooling to obtain two A 1 1 C descriptor. Then send them to MLP (containing a hidden layer), the number of neurons in the first layer is C/r, and the number of neurons in the second layer is C. This neural network is shared (I don't know whether it is shared between two descriptors or two layers of neural networks, which should be the latter). Then apply element-wise addition to the two output vectors, and obtain the weight coefficient Mc through the sigmoid function. Finally, multiply Mc with the original feature F to obtain a new feature F'
  • Spatial attention module: Pay attention to where the features are meaningful.
    Input the feature map F'as H W C (also may have batch). First, two descriptors of H W 1 are obtained through channel average pooling and maximum pooling . And spliced ​​by channel. Then through a 7*7 convolutional layer and sigmoid activation function, the weight coefficient Ms is obtained. Finally, multiply Ms and F'to get the final attention feature.

2. Code implementation

After reading the above process, it is easy to understand the implementation of the code. Here are two pieces of code. One is to refer to the blogger's code , and the other is Github: kobiso/CBAM-tensorflow .
Code one:

def combined_static_and_dynamic_shape(tensor):
    """Returns a list containing static and dynamic values for the dimensions.  Returns a list of static 
    and dynamic values for shape dimensions. This is  useful to preserve static shapes when available in reshape operation.  
    Args:    tensor: A tensor of any type.  
    Returns:    A list of size tensor.shape.ndims containing integers or a scalar tensor.  """
    static_tensor_shape = tensor.shape.as_list()
    dynamic_tensor_shape = tf.shape(tensor)
    combined_shape = []
    for index, dim in enumerate(static_tensor_shape):
        if dim is not None:
            combined_shape.append(dim)
        else:
            combined_shape.append(dynamic_tensor_shape[index])
    return combined_shape


def convolutional_block_attention_module(feature_map, index, reduction_ratio = 0.5):
    """CBAM:convolutional block attention module
    Args:
        feature_map:input feature map
        index:the index of the module
        reduction_ratio:output units number of first MLP layer:reduction_ratio * feature map
    Return:
        feature map with channel and spatial attention"""

    with tf.variable_scope("cbam_%s" % (index)):
        feature_map_shape = combined_static_and_dynamic_shape(feature_map)
        # channel attention module
        channel_avg_weights = tf.nn.avg_pool(value=feature_map,
                                             ksize=[1, feature_map_shape[1], feature_map_shape[2], 1],
                                             strides=[1, 1, 1, 1],
                                             padding='VALID')  # global average pool
        channel_max_weights = tf.nn.max_pool(value=feature_map,
                                             ksize=[1, feature_map_shape[1], feature_map_shape[2], 1],
                                             strides=[1, 1, 1, 1],
                                             padding='VALID')
        channel_avg_reshape = tf.reshape(channel_avg_weights,
                                         [feature_map_shape[0], 1, feature_map_shape[3]])
        channel_max_reshape = tf.reshape(channel_max_weights,
                                         [feature_map_shape[0], 1, feature_map_shape[3]])
        channel_w_reshape = tf.concat([channel_avg_reshape, channel_max_reshape], axis=1)

        fc_1 = tf.layers.dense(inputs=channel_w_reshape,
                               units=feature_map_shape[3] * reduction_ratio,
                               name="fc_1",
                               activation=tf.nn.relu)
        fc_2 = tf.layers.dense(inputs=fc_1,
                               units=feature_map_shape[3],
                               name="fc_2",
                               activation=None)
        channel_attention = tf.reduce_sum(fc_2, axis=1, name="channel_attention_sum")
        channel_attention = tf.nn.sigmoid(channel_attention)
        channel_attention = tf.reshape(channel_attention,
                                       shape=[feature_map_shape[0], 1, 1, feature_map_shape[3]])
        feature_map_with_channel_attention = tf.multiply(feature_map, channel_attention)
        # saptial attention module
        # 通道平均池化,格式NWHC
        channel_wise_avg_pooling = tf.reduce_mean(feature_map_with_channel_attention, axis=3)
        channel_wise_avg_pooling = tf.reshape(channel_wise_avg_pooling,
                                              shape=[feature_map_shape[0], feature_map_shape[1],
                                                     feature_map_shape[2], 1]) # shape=[batch, H, W, 1]
        # 通道最大池化
        channel_wise_max_pooling = tf.reduce_max(feature_map_with_channel_attention, axis=3)
        channel_wise_max_pooling = tf.reshape(channel_wise_max_pooling,
                                              shape=[feature_map_shape[0], feature_map_shape[1],
                                                     feature_map_shape[2], 1])
        # 按通道拼接
        channel_wise_pooling = tf.concat([channel_wise_avg_pooling, channel_wise_max_pooling], axis=3)
        spatial_attention = slim.conv2d(channel_wise_pooling, 1, [7, 7],
                                        padding='SAME',
                                        activation_fn=tf.nn.sigmoid,
                                        scope="spatial_attention_conv")
        feature_map_with_attention = tf.multiply(feature_map_with_channel_attention, spatial_attention)
        return feature_map_with_attention

When this code is implemented in the channel attention module, it first splices two 1 1 C descriptors and then inputs them into the MLP, and the weights of the two MLP layers are not shared, which feels a bit problematic. Due to the word limit, another code will be put on the next blog.
I put this module in one of my four-layer neural network for handwritten MNIST classification, but I don't feel that the accuracy has been improved much, and intuitively feels that it runs more slowly. In the paper, inserting CBAM into some large networks, I feel that the performance improvement is not particularly large. . .

Guess you like

Origin blog.csdn.net/qq_43265072/article/details/106057548