YOLOv8/YOLOv7/YOLOv5/YOLOv4/Faster-rcnn series algorithm improvement [NO.69] Improvement of small targets in remote sensing image target detection CATnet (ContextAggregation module)

Preface
As the current advanced deep learning target detection algorithm YOLOv8, a large number of tricks have been collected, but there is still room for improvement and improvement. Different improvement methods can be used for detection difficulties in specific application scenarios. The following series of articles will focus on how to improve YOLOv8 in detail. The purpose is to provide meager help and reference for those students engaged in scientific research who need innovation or friends who engage in engineering projects to achieve better results. Since YOLOv8, YOLOv7, and YOLOv5 algorithms have emerged in 2020, a large number of improved papers have emerged. Whether it is for students engaged in scientific research or friends who are already working, the value and novelty of the research are not enough. In order to keep pace with the times In the future, the improved algorithm will be based on YOLOv7. The previous YOLOv5 improvement method is also applicable to YOLOv7, so continue the serial number of the YOLOv5 series improvement. In addition, the improved method can also be applied to other target detection algorithms such as YOLOv5 for improvement. Hope to be helpful to everyone.

1. Solve the problem

The method proposed in this article is mainly used in remote sensing image detection. By proposing the context aggregation network (CATNet) to improve the feature extraction process, it will have a certain effect on small targets in remote sensing images. Try to introduce the YOLO series of algorithms to improve detection. Effect.

2. Basic principles

Original link:  2111.11057.pdf (arxiv.org)

Code link: CATNet/scp.py at main yeliudev/CATNet GitHub

Abstract: The task of instance segmentation in remote perception images aims to perform instance-level pixel-level labeling, which is important for various civilian applications. Despite previous successes, most existing instance segmentation methods designed for natural images suffer from severe performance degradation when directly applied to top-down distant perception images. After careful analysis, we find that these challenges mainly come from the lack of discriminative object features, which are affected by severe scale variation, low contrast and clustered distribution. To address these issues, a novel Context Aggregation Network (CATNet) is proposed to improve the feature extraction process. The proposed model utilizes three lightweight plug-and-play modules, namely Dense Feature Pyramid Network (DenseFPN), Spatial Context Pyramid (SCP) and Hierarchical Region-of-Interest Extractor (HRoIE), in feature, space and Aggregating global visual context in the instance domain. DenseFPN is a multi-scale feature propagation module that establishes a more flexible information flow by adopting intra-layer residual connections, inter-layer dense connections, and feature reweighting strategies. Utilizing an attention mechanism, SCP further enhances features by aggregating global spatial context into local regions. For each instance, HRoIE adaptively generates RoI features for different downstream tasks. We extensively evaluate the proposed scheme on the challenging iSAID, DIOR, NWPU VHR-10 and HRSID datasets. The evaluation results show that the proposed method outperforms the existing state-of-the-art at similar computational cost.

 Extending and explicitly separating the notion of context into feature, space, and instance domains leads to superior performance in long-range perceptual image segmentation. To our knowledge, this is the first study that considers global visual context beyond spatial dependencies. The proposed CATNet can utilize DenseFPN, SCP and HRoIE to learn and aggregate global visual context from different domains for object detection and instance segmentation of long-range perception images. The proposed scheme has been tested on various datasets, including iSAID, DIOR, NWPU VHR-10, and HRSID, and achieved new state-of-the-art performance at similar computational cost.

 The proposed method is extensively evaluated on iSAID, DIOR, NWPU VHR-10 and HRSID datasets. The module is evaluated first on the instance segmentation task on the iSAID dataset, and then on the object detection task on the DIOR and NWPU VHR-10 datasets to demonstrate its effectiveness in optical remote sensing images. We also validate the generalization ability in SAR images using the HRSID dataset.

 3. Add method

The ContextAggregation module code in CATnet proposed in the original paper is as follows:

import torch
import torch.nn as nn
from mmcv.cnn import ConvModule, caffe2_xavier_init, constant_init
from mmcv.runner import BaseModule, auto_fp16
from mmdet.models import NECKS


class ContextAggregation(nn.Module):
    """
    Context Aggregation Block.
    Args:
        in_channels (int): Number of input channels.
        reduction (int, optional): Channel reduction ratio. Default: 1.
        conv_cfg (dict or None, optional): Config dict for the convolution
            layer. Default: None.
    """

    def __init__(self, in_channels, reduction=1, conv_cfg=None):
        super(ContextAggregation, self).__init__()
        self.in_channels = in_channels
        self.reduction = reduction
        self.inter_channels = max(in_channels // reduction, 1)

        conv_params = dict(kernel_size=1, conv_cfg=conv_cfg, act_cfg=None)

        self.a = ConvModule(in_channels, 1, **conv_params)
        self.k = ConvModule(in_channels, 1, **conv_params)
        self.v = ConvModule(in_channels, self.inter_channels, **conv_params)
        self.m = ConvModule(self.inter_channels, in_channels, **conv_params)

        self.init_weights()

    def init_weights(self):
        for m in (self.a, self.k, self.v):
            caffe2_xavier_init(m.conv)
        constant_init(self.m.conv, 0)

    def forward(self, x):
        n, c = x.size(0), self.inter_channels

        # a: [N, 1, H, W]
        a = self.a(x).sigmoid()

        # k: [N, 1, HW, 1]
        k = self.k(x).view(n, 1, -1, 1).softmax(2)

        # v: [N, 1, C, HW]
        v = self.v(x).view(n, 1, c, -1)

        # y: [N, C, 1, 1]
        y = torch.matmul(v, k).view(n, c, 1, 1)
        y = self.m(y) * a

        return x + y

The number of network layers and parameters after the improvement are as follows. The blogger is training and testing on the NWPU VHR-10 remote sensing dataset, and the experiment has an improvement effect. For specific methods of obtaining, you can private message to obtain the Baidu link of the improved YOLO project.

Four. Summary

A preview: the next article will continue to share related improvement methods for deep learning algorithms. Interested friends can pay attention to me, if you have any questions, you can leave a message or chat with me privately

PS: This method is not only suitable for improving YOLOv5, but also can improve other YOLO networks and target detection networks, such as YOLOv7, v6, v4, v3, Faster rcnn, ssd, etc.

Finally, please pay attention to private message me if you need it. Pay attention to receive free learning materials for deep learning algorithms!

YOLO series algorithm improvement method | Directory list

Guess you like

Origin blog.csdn.net/m0_70388905/article/details/131137887