YOLOv8/YOLOv7/YOLOv5/YOLOv4/Faster-rcnn series algorithm improvement [NO.71] attention mechanism Expectation-Maximization Attention (EMA module)

Preface
As the current advanced deep learning target detection algorithm YOLOv8, a large number of tricks have been collected, but there is still room for improvement and improvement. Different improvement methods can be used for detection difficulties in specific application scenarios. The following series of articles will focus on how to improve YOLOv8 in detail. The purpose is to provide meager help and reference for those students engaged in scientific research who need innovation or friends who engage in engineering projects to achieve better results. Since YOLOv8, YOLOv7, and YOLOv5 algorithms have emerged in 2020, a large number of improved papers have emerged. Whether it is for students engaged in scientific research or friends who are already working, the value and novelty of the research are not enough. In order to keep pace with the times In the future, the improved algorithm will be based on YOLOv7. The previous YOLOv5 improvement method is also applicable to YOLOv7, so continue the serial number of the YOLOv5 series improvement. In addition, the improved method can also be applied to other target detection algorithms such as YOLOv5 for improvement. Hope to be helpful to everyone.

1. Solve the problem

The method proposed in this paper mainly replaces layer-by-layer stacking with parallel sub-networks. This helps to effectively reduce depth while maintaining high performance. Try to use the proposed method to improve the target detection algorithm and improve the target detection effect.

2. Basic principles

Original link: 1907.13426.pdf (arxiv.org)

代码链接:GitHub - XiaLiPKU/EMANet: The code for Expectation-Maximization Attention Networks for Semantic Segmentation (ICCV'2019 Oral)

 Abstract: Self-attention mechanisms have been widely used in various tasks. It aims to compute a representation for each location by a weighted sum of features from all locations. Therefore, it can capture long-range relationships for computer vision tasks. However, it is computationally expensive since the attention map is computed for all other locations. In this paper, we formulate the attention mechanism as an expectation-maximization manner, and iteratively estimate a more compact set of bases in which attention maps are computed. Through weighted summation of these bases, the resulting representation is low-rank and removes noisy information from the input. The proposed Expectation-Maximizing Attention (EMA) module is robust to the variance of the input while being memory and computation friendly. Furthermore, we establish base maintenance and normalization methods to stabilize its training process. We conduct extensive experiments on popular semantic segmentation benchmark datasets, including PASCAL VOC, PASCAL Context, and COCO Stuff, and achieve new records in them.

3. Add method

Part of the reference source code proposed in the original paper is as follows:

import torch
from torch import nn

class EMA(nn.Module):
    def __init__(self, channels, factor=8):
        super(EMA, self).__init__()
        self.groups = factor
        assert channels // self.groups > 0
        self.softmax = nn.Softmax(-1)
        self.agp = nn.AdaptiveAvgPool2d((1, 1))
        self.pool_h = nn.AdaptiveAvgPool2d((None, 1))
        self.pool_w = nn.AdaptiveAvgPool2d((1, None))
        self.gn = nn.GroupNorm(channels // self.groups, channels // self.groups)
        self.conv1x1 = nn.Conv2d(channels // self.groups, channels // self.groups, kernel_size=1, stride=1, padding=0)
        self.conv3x3 = nn.Conv2d(channels // self.groups, channels // self.groups, kernel_size=3, stride=1, padding=1)

    def forward(self, x):
        b, c, h, w = x.size()
        group_x = x.reshape(b * self.groups, -1, h, w)  # b*g,c//g,h,w
        x_h = self.pool_h(group_x)
        x_w = self.pool_w(group_x).permute(0, 1, 3, 2)
        hw = self.conv1x1(torch.cat([x_h, x_w], dim=2))
        x_h, x_w = torch.split(hw, [h, w], dim=2)
        x1 = self.gn(group_x * x_h.sigmoid() * x_w.permute(0, 1, 3, 2).sigmoid())
        x2 = self.conv3x3(group_x)
        x11 = self.softmax(self.agp(x1).reshape(b * self.groups, -1, 1).permute(0, 2, 1))
        x12 = x2.reshape(b * self.groups, c // self.groups, -1)  # b*g, c//g, hw
        x21 = self.softmax(self.agp(x2).reshape(b * self.groups, -1, 1).permute(0, 2, 1))
        x22 = x1.reshape(b * self.groups, c // self.groups, -1)  # b*g, c//g, hw
        weights = (torch.matmul(x11, x12) + torch.matmul(x21, x22)).reshape(b * self.groups, 1, h, w)
        return (group_x * weights.sigmoid()).reshape(b, c, h, w)

The number of network layers and parameters after the improvement are as follows. The blogger is training and testing on the NWPU VHR-10 remote sensing dataset, and the experiment has an improvement effect. For specific methods of obtaining, you can private message to obtain the Baidu link of the improved YOLO project.

Four. Summary

A preview: the next article will continue to share related improvement methods for deep learning algorithms. Interested friends can pay attention to me, if you have any questions, you can leave a message or chat with me privately

PS: This method is not only suitable for improving YOLOv5, but also can improve other YOLO networks and target detection networks, such as YOLOv7, v6, v4, v3, Faster rcnn, ssd, etc.

Finally, please pay attention to private message me if you need it. Pay attention to receive free learning materials for deep learning algorithms!

YOLO series algorithm improvement method | Directory list

Guess you like

Origin blog.csdn.net/m0_70388905/article/details/131361576