YOLOv5 improvements | Main article | CSWinTransformer cross-shaped window network improved feature fusion layer

 1. Introduction to this article

The improved mechanism brought to you in this article is CSWin Transformer , which is based on the Transformer architecture and innovatively introduces a cross-shaped window self-attention mechanism , which is used to effectively process the horizontal and vertical strips of the image in parallel to form a cross-shaped window to improve Computational efficiency. It also proposes Locally Enhanced Position Encoding (LePE) to better handle local position information. I will replace it with YOLOv8's feature extraction network to extract more useful features. After my experiments, the backbone network can indeed improve the detection of three types of objects: large, medium and small. At the same time, the backbone network also provides multiple versions . You can use modified versions in the source code. This article introduces its main framework principles and then teaches you how to add the network structure to the network model.

Recommendation index: ⭐⭐⭐⭐

Point increase effect:

おすすめ

転載: blog.csdn.net/java1314777/article/details/135443930