The latest progress in object detection 2022

foreword

The previous article on target detection overview introduced the knowledge related to target detection in detail. This blog serves as an extension and supplement, recording the latest progress of current (2022) target detection, mainly the target detection that dominates the list on coco test-dev and is well-known network. For details, please refer to relevant papers or codes.

Swim Transformer V2

Paper address: Swin Transformer V2: Scaling Up Capacity and Resolution
Code address: Swim Transformer V2 Code
This method demonstrates the sota discussion of extending Swim Transformer to 3 billion parameters and enabling it to use images of up to 1536 input sizes for training. By scaling up network capacity and resolution, Swim Transformer sets records on four representative vision benchmarks: 84.0% top-1 accuracy for ImageNet-V2 image classification, 63.1/54.4 box/mask mAP for COCO object detection, 59.9 mIoU for ADE20K semantic segmentation and 86.8% top-1 accuracy for Kinetics-400 video action classification. The technology used by Swin Transformer V2 is usually to expand the visual model, but it has not been explored as widely as the NLP language model, partly because of the following difficulties in training and application: 1) The visual model often faces the problem of large-scale unbalanced samples ; 2) Many downstream vision tasks require high-resolution images or sliding windows, and it is unclear how to effectively convert a low-resolution pre-trained model to a higher-resolution model; 3) When the image resolution is high, GPU memory consumption is also an issue. To address these issues, the research team proposes several techniques, illustrated by using Swin Transformer as a case study: 1) post-normalization techniques and scaled cosine attention methods to improve the stability of large visual models; 2) a Logarithmically spaced sequential positional biasing techniques to efficiently transfer models pretrained on low-resolution images and windows to their higher-resolution counterparts. In addition, the team shared key implementation details that lead to significant savings in GPU memory consumption, making it feasible to train large vision models using conventional GPUs.

insert image description here

Swin Transformer

论文:Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
代码:Swin Transformer Code
insert image description here
insert image description here
insert image description here

Dynamic Head

论文:Dynamic Head: Unifying Object Detection Heads with Attentions
代码:Dynamic Head Code

insert image description here

insert image description here

insert image description here

YOLOF

Paper: You Only Look One-level Feature
Code: YOLOF Code
insert image description here

YOLOR

论文:You Only Learn One Representation: Unified Network for Multiple Tasks
代码:YOLOR Code

YOLOX

Paper: YOLOX: Exceeding YOLO Series in 2021
Code: YOLOX Code

insert image description here

Scaled-YOLOv4

Paper: Scaled-YOLOv4: Scaling Cross Stage Partial Network
Code: Scaled-YOLOv4 Code

insert image description here
insert image description here
insert image description here

Scale-Aware Trident Networks

Paper: Scale-Aware Trident Networks for Object Detection
Code: Scale-Aware Trident Networks Code

insert image description here
insert image description here

insert image description here

DETR

Paper: End-to-End Object Detection with Transformers
Code: DETR Code

insert image description here
insert image description here

Dynamic R-CNN

论文:Dynamic R-CNN: Towards High Quality Object Detection via Dynamic Training
代码:Dynamic R-CNN Code

insert image description here

Guess you like

Origin blog.csdn.net/u012655441/article/details/123552537