NEU-DET steel surface defect detection based on Yolov8, the optimized combination is highly novel: CVPR2023 PConv and BiLevelRoutingAttention, with obvious growth points

1. Introduction to steel defect data set

There are six categories of surface defects in NEU-DET steel, namely: 'crazing', 'inclusion', 'patches', 'pitted_surface', 'rolled-in_scale', 'scratches'

The distribution of each category is:

2. Training based on yolov8

The original network is as follows:

[email protected] is 0.733

 

2   PConv

2.1 Introduction to FasterNet

    To design fast neural networks, much work has focused on reducing the number of floating point operations (FLOPs). However, the authors observed that this reduction in FLOPs does not necessarily lead to a similar degree of reduction in latency. This mainly stems from low floating-point operations per second (FLOPS) inefficiencies. In order to achieve faster networks, the authors revisited the operators of FLOPs and proved that such low FLOPS is mainly due to the frequent memory access of operators, especially deep convolutions. Therefore, this paper proposes a new partial convolution (PConv) that can extract spatial features more effectively by simultaneously reducing redundant computation and memory access.

     Based on PConv, FasterNet is further proposed, which is a new neural network family that achieves much higher operating speed than other networks on a wide range of devices without affecting the accuracy of various visual tasks. For example, on ImageNet-1k, small FasterNet-T0 is 3.1x, 3.1x, and 2.5x faster than MobileVitXXS on GPU, CPU, and ARM processors respectively, while improving accuracy by 2.9%.
     Fast and good! This article proposes a new Partial Convolution (PConv), which simultaneously reduces redundant calculations and memory accesses, and further proposes FasterNet: a new neural network family that runs faster on multiple processing platforms and is superior to networks such as MobileVit;

Paper address: https://arxiv.org/abs/2303.03667

github:GitHub - JierunChen/FasterNet: Code release for PConv and FasterNet

2.2 Partial Convolution 

We propose a new partial convolution (PConv) that can extract spatial features more efficiently by simultaneously reducing redundant computation and memory access.

[email protected] is 0.756

Schematic:

3 BiLevelRoutingAttention

3.1 Introduction to BiFormer

Paper: https://arxiv.org/pdf/2303.08810.pdf

Background: The attention mechanism is one of the core building blocks of Vision Transformer and can capture long-range dependencies. However, this powerful feature comes with a huge computational burden and memory overhead due to the need to compute pairwise token interactions between all spatial locations. To alleviate this problem, a series of works attempt to solve this problem by introducing hand-crafted and content-independent sparsity into attention, such as restricting attention operations to local windows, axial stripes, or dilated windows.

Method of this article: This article proposes a two-layer routing method with dynamic sparse attention. For a query, irrelevant key-value pairs are first filtered out at a coarse region level, and then fine-grained token-to-token attention is applied on the union of the remaining candidate regions (i.e., routing regions). The proposed dual-layer routing attention has a simple yet effective implementation, exploits sparsity to save computation and memory, and only involves GPU-friendly dense matrix multiplication. A new universal Vision Transformer called BiFormer was built on this basis.

 Among them, Figure (a) is the original attention implementation, which operates directly on the global scale, resulting in high computational complexity and large memory usage; while for Figures (b)-(d), these methods are implemented by introducing different manual modes. Sparse attention is used to reduce complexity, such as local windows, axial stripes, and expanded windows; while Figure (e) is based on deformable attention to achieve image adaptive sparsity through irregular grids; the author believes that these methods Most attempts to alleviate this problem by introducing handcrafted and content-independent sparsity into the attention mechanism. bi-level routingTherefore, this paper proposes a novel dynamic sparse attention ( ) through dual-layer routing ( dynamic sparse attention ) to achieve more flexible computing allocation and content awareness , making it have dynamic query-aware sparsity, as shown in Figure (f) .

[email protected] is 0.746

Schematic:

4  PConv+BiLevelRoutingAttention

[email protected] is 0.761

4. Summary

By introducing the idea of ​​CVPR2023 PConv+BiLevelRoutingAttention, we have achieved breakthroughs in steel defects. Compared with some published papers, the degree of innovation and novelty is much better. If you need it, you can conduct experiments on your own data set, and it is very likely that the paper will be published successfully. oh! ! !

5. Source code acquisition

NEU-DET steel surface defect detection based on Yolov8, the optimized combination is highly novel: CVPR2023 PConv and BiLevelRoutingAttention, with obvious growth points_AI Little Monster's Blog-CSDN Blog

Guess you like

Origin blog.csdn.net/m0_63774211/article/details/132790913