YOLOv8/YOLOv7/YOLOv5/YOLOv4/Faster-rcnn series algorithm improvement [NO.72] attention mechanism Linear Context Transform Block (LCT module)

Preface
As the current advanced deep learning target detection algorithm YOLOv8, a large number of tricks have been collected, but there is still room for improvement and improvement. Different improvement methods can be used for detection difficulties in specific application scenarios. The following series of articles will focus on how to improve YOLOv8 in detail. The purpose is to provide meager help and reference for those students engaged in scientific research who need innovation or friends who engage in engineering projects to achieve better results. Since YOLOv8, YOLOv7, and YOLOv5 algorithms have emerged in 2020, a large number of improved papers have emerged. Whether it is for students engaged in scientific research or friends who are already working, the value and novelty of the research are not enough. In order to keep pace with the times In the future, the improved algorithm will be based on YOLOv7. The previous YOLOv5 improvement method is also applicable to YOLOv7, so continue the serial number of the YOLOv5 series improvement. In addition, the improved method can also be applied to other target detection algorithms such as YOLOv5 for improvement. Hope to be helpful to everyone.

Link: https://pan.baidu.com/s/1fN07LssywnP_CFDZGPcK7A

Extraction code: private message after following

1. Solve the problem

The method proposed in this paper mainly replaces layer-by-layer stacking with parallel sub-networks. This helps to effectively reduce depth while maintaining high performance. Try to use the proposed method to improve the target detection algorithm and improve the target detection effect.

2. Basic principles

Original link: Linear Context Transform Block (arxiv.org)

  Abstract: This paper introduces the Squeeze-and-Excitation (SE) block, which provides a channel attention mechanism by explicitly capturing dependencies between channels to model global context. However, we still know very little about how SE blocks work. This study first revisits the SE block, and then conducts a detailed empirical study based on the relationship between global context and attention distribution, based on which a simple yet effective module is proposed, called Linear Context Transformation (LCT) block. We divide all channels into distinct groups and normalize globally aggregated contextual features within each channel group, reducing interference from irrelevant channels. By linearly transforming the normalized context features, we can model the global context for each channel independently. The LCT block is very lightweight and easy to plug into different backbone models, while adding almost negligible parameter and computational burden. Extensive experiments show that LCT blocks outperform SE blocks in image classification tasks on ImageNet as well as object detection/segmentation tasks on COCO dataset, regardless of backbone models with different capacities. In addition, LCT achieves consistent performance improvements on existing state-of-the-art detection architectures, e.g., APbbox and APmask improve by 1.51.7% and 1.0%1.2% respectively on the COCO benchmark, regardless of using different capacity baseline models . We hope that our simple yet effective approach can shed some light on future research on attention-based models.

3. Add method

Part of the reference source code proposed in the original paper is as follows:

class LCT(nn.Module):
    def __init__(self, channels, groups, eps=1e-5):
        super().__init__()
        assert channels % groups == 0, "Number of channels should be evenly divisible by the number of groups"
        self.groups = groups
        self.channels = channels
        self.eps = eps
        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
        self.w = nn.Parameter(torch.ones(channels))
        self.b = nn.Parameter(torch.zeros(channels))
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        batch_size = x.shape[0]
        y = self.avgpool(x).view(batch_size, self.groups, -1)
        mean = y.mean(dim=-1, keepdim=True)
        mean_x2 = (y ** 2).mean(dim=-1, keepdim=True)
        var = mean_x2 - mean ** 2
        y_norm = (y - mean) / torch.sqrt(var + self.eps)
        y_norm = y_norm.reshape(batch_size, self.channels, 1, 1)
        y_norm = self.w.reshape(1, -1, 1, 1) * y_norm + self.b.reshape(1, -1, 1, 1)
        y_norm = self.sigmoid(y_norm)
        return x * y_norm.expand_as(x)

The number of network layers and parameters after the improvement are as follows. The blogger is training and testing on the NWPU VHR-10 remote sensing dataset, and the experiment has an improvement effect. For specific methods of obtaining, you can private message to obtain the Baidu link of the improved YOLO project.

Four. Summary

A preview: the next article will continue to share related improvement methods for deep learning algorithms. Interested friends can pay attention to me, if you have any questions, you can leave a message or chat with me privately

PS: This method is not only suitable for improving YOLOv5, but also can improve other YOLO networks and target detection networks, such as YOLOv7, v6, v4, v3, Faster rcnn, ssd, etc.

Finally, please pay attention to private message me if you need it. Pay attention to receive free learning materials for deep learning algorithms!

YOLO series algorithm improvement method | Directory list

Guess you like

Origin blog.csdn.net/m0_70388905/article/details/131629308