[YOLOv8/YOLOv7/YOLOv5/YOLOv4/Faster-rcnn series algorithm improvement NO.65] CVPR 2023 | Tsinghua team's plug-and-play network architecture—Slide-Transformer

 Preface
As the current advanced deep learning target detection algorithm YOLOv8, a large number of tricks have been collected, but there is still room for improvement and improvement. Different improvement methods can be used for detection difficulties in specific application scenarios. The following series of articles will focus on how to improve YOLOv8 in detail. The purpose is to provide meager help and reference for those students engaged in scientific research who need innovation or friends who engage in engineering projects to achieve better results. Since YOLOv8, YOLOv7, and YOLOv5 algorithms have emerged in 2020, a large number of improved papers have emerged. Whether it is for students engaged in scientific research or friends who are already working, the value and novelty of the research are not enough. In order to keep pace with the times In the future, the improved algorithm will be based on YOLOv7. The previous YOLOv5 improvement method is also applicable to YOLOv7, so continue the serial number of the YOLOv5 series improvement. In addition, the improvement method can also be applied to other algorithms such as YOLOv5 for improvement. Hope to be helpful to everyone.

1. Solve the problem

The Tsinghua team proposes a novel local attention module Slide Attentionthat leverages common convolution operations to achieve high efficiency , flexibility , and generality . The previous SEnet attention mechanism has been upgraded and improved. Improve target detection performance.

2. Basic principles

Paper:Slide-Transformer: Hierarchical Vision Transformer with Local Self-Attention

Code:https://github.com/LeapLabTHU/Slide-Transformer

Abstract: Self-attention mechanisms have been a key factor in the progress of visual transformers (ViT), which enable adaptive feature extraction from global context. However, existing self-attention methods either adopt sparse global attention or window attention to reduce computational complexity, which may affect local feature learning, or suffer from some manual design. In contrast, local attention restricts the receptive field of each query to its own neighboring pixels, which enjoys the benefits of both convolution and self-attention, namely local inductive bias and dynamic feature selection. Nonetheless, current local attention modules either use the inefficient Im2Col function, or rely on specific CUDA kernels that are difficult to scale to devices without CUDA support. In this paper, we propose a new local attention module, Slide attention, which leverages common convolution operations for high efficiency, flexibility, and generalizability. Specifically, we first reinterpret the column-based Im2Col function from a new row-based perspective, and use depthwise convolution as an effective replacement. Based on this, we propose a deformation shift module based on reparameterization technique, which further relaxes the fixed key/value positions of deformation features in local regions. In this way, our module implements the local attention paradigm in an efficient and flexible manner. Extensive experiments demonstrate that our slide attention module is applicable to various state-of-the-art visual transformer models, is compatible with various hardware devices, and achieves consistently improved performance on synthetic benchmarks. 

 3. Add method

The relevant codes are as follows: specific improvement methods, private message after paying attention.

Four. Summary

A preview: the next article will continue to share related improvement methods for deep learning algorithms. Interested friends can pay attention to me, if you have any questions, you can leave a message or chat with me privately

PS: This method is not only suitable for improving YOLOv5, but also can improve other YOLO networks and target detection networks, such as YOLOv7, v6, v4, v3, Faster rcnn, ssd, etc.

Finally, please pay attention to private message me if you need it. Pay attention to receive free learning materials for deep learning algorithms!

YOLO series algorithm improvement method | Directory list

Guess you like

Origin blog.csdn.net/m0_70388905/article/details/130150969