CVPR 2022 论文

提示：文章写完后，目录可以自动生成，如何生成可参考右边的帮助文档

文章目录

- CVPR 2022 论文
Backbone
CLIP
GAN
NAS
OCR
NeRF
Visual Transformer
- Backbone
- 应用(Application)
视觉和语言(Vision-Language)
自监督学习(Self-supervised Learning)
数据增强(Data Augmentation)
目标检测(Object Detection)
目标跟踪(Visual Tracking)
- 多目标跟踪(Multi-Object Tracking)
语义分割(Semantic Segmentation)
- 弱监督语义分割
- 半监督语义分割
- 无监督语义分割
实例分割(Instance Segmentation)
- 自监督实例分割
- 视频实例分割
小样本分割(Few-Shot Segmentation)
视频理解(Video Understanding)
- 行为识别(Action Recognition)
- 动作检测(Action Detection)
图像编辑(Image Editing)
Low-level Vision
超分辨率(Super-Resolution)
- 图像超分辨率(Image Super-Resolution)
- 视频超分辨率(Video Super-Resolution)
去模糊(Deblur)
- 图像去模糊(Image Deblur)
3D点云(3D Point Cloud)
3D目标检测(3D Object Detection)
3D语义分割(3D Semantic Segmentation)
3D目标跟踪(3D Object Tracking)
3D人体姿态估计(3D Human Pose Estimation)
3D语义场景补全(3D Semantic Scene Completion)
3D重建(3D Reconstruction)
伪装物体检测(Camouflaged Object Detection)
深度估计(Depth Estimation)
- 单目深度估计
立体匹配(Stereo Matching)
车道线检测(Lane Detection)
图像修复(Image Inpainting)
图像检索(Image Retrieval)
人脸识别(Face Recognition)
人群计数(Crowd Counting)
医学图像(Medical Image)
场景图生成(Scene Graph Generation)
参考视频目标分割(Referring Video Object Segmentation)
风格迁移(Style Transfer)
Adversarial Examples(对抗样本)
弱监督物体检测(Weakly Supervised Object Localization)
雷达目标检测(Radar Object Detection)
高光谱图像重建(Hyperspectral Image Reconstruction)
图像拼接(Image Stitching)
水印(Watermarking)
Grounded Situation Recognition
Zero-shot Learning
数据集(Datasets)
新任务(New Task)
其他(Others)

Backbone

A ConvNet for the 2020s

Paper: https://arxiv.org/abs/2201.03545
Code: https://github.com/facebookresearch/ConvNeXt
中文解读：https://mp.weixin.qq.com/s/Xg5wPYExnvTqRo6s5-2cAw

Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs

Paper: https://arxiv.org/abs/2203.06717
Code: https://github.com/megvii-research/RepLKNet
Code2: https://github.com/DingXiaoH/RepLKNet-pytorch
中文解读：https://mp.weixin.qq.com/s/_qXyIQut-JRW6VvsjaQlFg

MPViT : Multi-Path Vision Transformer for Dense Prediction

Paper: https://arxiv.org/abs/2112.11010
Code: https://github.com/youngwanLEE/MPViT
中文解读: https://mp.weixin.qq.com/s/Q9-crEOz5IYzZaNoq8oXfg

Mobile-Former: Bridging MobileNet and Transformer

Paper: https://arxiv.org/abs/2108.05895
Code: None
中文解读：https://mp.weixin.qq.com/s/yo5KmB2Y7t2R4jiOKI87HQ

MetaFormer is Actually What You Need for Vision

Paper: https://arxiv.org/abs/2111.11418
Code: https://github.com/sail-sg/poolformer

Shunted Self-Attention via Multi-Scale Token Aggregation

Paper(Oral): https://arxiv.org/abs/2111.15193
Code: https://github.com/OliverRensu/Shunted-Transformer

CLIP

HairCLIP: Design Your Hair by Text and Reference Image

Paper: https://arxiv.org/abs/2112.05142
Code: https://github.com/wty-ustc/HairCLIP

PointCLIP: Point Cloud Understanding by CLIP

Paper: https://arxiv.org/abs/2112.02413
Code: https://github.com/ZrrSkywalker/PointCLIP

Blended Diffusion for Text-driven Editing of Natural Images

Paper: https://arxiv.org/abs/2111.14818
Code: https://github.com/omriav/blended-diffusion

GAN

SemanticStyleGAN: Learning Compositional Generative Priors for Controllable Image Synthesis and Editing

Homepage: https://semanticstylegan.github.io/
Paper: https://arxiv.org/abs/2112.02236
Demo: https://semanticstylegan.github.io/videos/demo.mp4

Style Transformer for Image Inversion and Editing

Paper: https://arxiv.org/abs/2203.07932
Code: https://github.com/sapphire497/style-transformer

NAS

β-DARTS: Beta-Decay Regularization for Differentiable Architecture Search

Paper: https://arxiv.org/abs/2203.01665
Code: https://github.com/Sunshine-Ye/Beta-DARTS

ISNAS-DIP: Image-Specific Neural Architecture Search for Deep Image Prior

Paper: https://arxiv.org/abs/2111.15362
Code: None

OCR

SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition

Paper: https://arxiv.org/abs/2203.10209
Code: https://github.com/mxin262/SwinTextSpotter

NeRF

Mip-NeRF 360: Unbounded Anti-Aliased Neural Radiance Fields

Homepage: https://jonbarron.info/mipnerf360/
Paper: https://arxiv.org/abs/2111.12077
Demo: https://youtu.be/YStDS2-Ln1s

Point-NeRF: Point-based Neural Radiance Fields

Homepage: https://xharlie.github.io/projects/project_sites/pointnerf/
Paper: https://arxiv.org/abs/2201.08845
Code: https://github.com/Xharlie/point-nerf

NeRF in the Dark: High Dynamic Range View Synthesis from Noisy Raw Images

Paper: https://arxiv.org/abs/2111.13679
Homepage: https://bmild.github.io/rawnerf/
Demo: https://www.youtube.com/watch?v=JtBS4KBcKVc

Urban Radiance Fields

Homepage: https://urban-radiance-fields.github.io/
Paper: https://arxiv.org/abs/2111.14643
Demo: https://youtu.be/qGlq5DZT6uc

Pix2NeRF: Unsupervised Conditional π-GAN for Single Image to Neural Radiance Fields Translation

Paper: https://arxiv.org/abs/2202.13162
Code: https://github.com/HexagonPrime/Pix2NeRF

HumanNeRF: Free-viewpoint Rendering of Moving People from Monocular Video

Homepage: https://grail.cs.washington.edu/projects/humannerf/
Paper: https://arxiv.org/abs/2201.04127
Demo: https://youtu.be/GM-RoZEymmw

Visual Transformer

Backbone

MPViT : Multi-Path Vision Transformer for Dense Prediction

Paper: https://arxiv.org/abs/2112.11010
Code: https://github.com/youngwanLEE/MPViT

MetaFormer is Actually What You Need for Vision

Paper: https://arxiv.org/abs/2111.11418
Code: https://github.com/sail-sg/poolformer

Mobile-Former: Bridging MobileNet and Transformer

Paper: https://arxiv.org/abs/2108.05895
Code: None
中文解读：https://mp.weixin.qq.com/s/yo5KmB2Y7t2R4jiOKI87HQ

Shunted Self-Attention via Multi-Scale Token Aggregation

Paper(Oral): https://arxiv.org/abs/2111.15193
Code: https://github.com/OliverRensu/Shunted-Transformer

应用(Application)

Language-based Video Editing via Multi-Modal Multi-Level Transformer

Paper: https://arxiv.org/abs/2104.01122
Code: None

MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video

Paper: https://arxiv.org/abs/2203.00859
Code: None

Embracing Single Stride 3D Object Detector with Sparse Transformer

Paper: https://arxiv.org/abs/2112.06375
Code: https://github.com/TuSimple/SST

Multi-class Token Transformer for Weakly Supervised Semantic Segmentation

Paper: https://arxiv.org/abs/2203.02891
Code: https://github.com/xulianuwa/MCTformer

Spatio-temporal Relation Modeling for Few-shot Action Recognition

Paper: https://arxiv.org/abs/2112.05132
Code: https://github.com/Anirudh257/strm

Mask-guided Spectral-wise Transformer for Efficient Hyperspectral Image Reconstruction

Paper: https://arxiv.org/abs/2111.07910
Code: https://github.com/caiyuanhao1998/MST

Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling

Homepage: https://point-bert.ivg-research.xyz/
Paper: https://arxiv.org/abs/2111.14819
Code: https://github.com/lulutang0608/Point-BERT

GroupViT: Semantic Segmentation Emerges from Text Supervision

Homepage: https://jerryxu.net/GroupViT/
Paper: https://arxiv.org/abs/2202.11094
Demo: https://youtu.be/DtJsWIUTW-Y

Restormer: Efficient Transformer for High-Resolution Image Restoration

Paper: https://arxiv.org/abs/2111.09881
Code: https://github.com/swz30/Restormer

Splicing ViT Features for Semantic Appearance Transfer

Homepage: https://splice-vit.github.io/
Paper: https://arxiv.org/abs/2201.00424
Code: https://github.com/omerbt/Splice

Self-supervised Video Transformer

Homepage: https://kahnchana.github.io/svt/
Paper: https://arxiv.org/abs/2112.01514
Code: https://github.com/kahnchana/svt

Learning Affinity from Attention: End-to-End Weakly-Supervised Semantic Segmentation with Transformers

Paper: https://arxiv.org/abs/2203.02664
Code: https://github.com/rulixiang/afa

Accelerating DETR Convergence via Semantic-Aligned Matching

Paper: https://arxiv.org/abs/2203.06883
Code: https://github.com/ZhangGongjie/SAM-DETR

DN-DETR: Accelerate DETR Training by Introducing Query DeNoising

Paper: https://arxiv.org/abs/2203.01305
Code: https://github.com/FengLi-ust/DN-DETR
中文解读: https://mp.weixin.qq.com/s/xdMfZ_L628Ru1d1iaMny0w

Style Transformer for Image Inversion and Editing

Paper: https://arxiv.org/abs/2203.07932
Code: https://github.com/sapphire497/style-transformer

MonoDTR: Monocular 3D Object Detection with Depth-Aware Transformer

Paper: https://arxiv.org/abs/2203.10981
Code: https://github.com/kuanchihhuang/MonoDTR

Mask Transfiner for High-Quality Instance Segmentation

Paper: https://arxiv.org/abs/2111.13673
Code: https://github.com/SysCV/transfiner

Language as Queries for Referring Video Object Segmentation

Paper: https://arxiv.org/abs/2201.00487
Code: https://github.com/wjn922/ReferFormer
中文解读：https://mp.weixin.qq.com/s/MkQT8QWSYoYVhJ1RSF6oPQ

X-Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense Captioning

Paper: https://arxiv.org/abs/2203.00843
Code: https://github.com/CurryYuan/X-Trans2Cap

AdaMixer: A Fast-Converging Query-Based Object Detector

Paper(Oral): https://arxiv.org/abs/2203.16507
Code: https://github.com/MCG-NJU/AdaMixer

Omni-DETR: Omni-Supervised Object Detection with Transformers

Paper: https://arxiv.org/abs/2203.16089
Code: https://github.com/amazon-research/omni-detr

SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition

Paper: https://arxiv.org/abs/2203.10209
Code: https://github.com/mxin262/SwinTextSpotter

视觉和语言(Vision-Language)

Conditional Prompt Learning for Vision-Language Models

Paper: https://arxiv.org/abs/2203.05557
Code: https://github.com/KaiyangZhou/CoOp

Bridging Video-text Retrieval with Multiple Choice Question

Paper: https://arxiv.org/abs/2201.04850
Code: https://github.com/TencentARC/MCQ

自监督学习(Self-supervised Learning)

UniVIP: A Unified Framework for Self-Supervised Visual Pre-training

Paper: https://arxiv.org/abs/2203.06965
Code: None

Crafting Better Contrastive Views for Siamese Representation Learning

Paper: https://arxiv.org/abs/2202.03278
Code: https://github.com/xyupeng/ContrastiveCrop
中文解读：https://mp.weixin.qq.com/s/VTP9D5f7KG9vg30U9kVI2A

HCSC: Hierarchical Contrastive Selective Coding

Homepage: https://github.com/gyfastas/HCSC
Paper: https://arxiv.org/abs/2202.00455
中文解读: https://mp.weixin.qq.com/s/jkYR8mYp-e645qk8kfPNKQ

数据增强(Data Augmentation)

TeachAugment: Data Augmentation Optimization Using Teacher Knowledge

Paper: https://arxiv.org/abs/2202.12513
Code: https://github.com/DensoITLab/TeachAugment

AlignMix: Improving representation by interpolating aligned features

Paper: https://arxiv.org/abs/2103.15375
Code: None

目标检测(Object Detection)

BoxeR: Box-Attention for 2D and 3D Transformers

Paper: https://arxiv.org/abs/2111.13087
Code: https://github.com/kienduynguyen/BoxeR
中文解读：https://mp.weixin.qq.com/s/UnUJJBwcAsRgz6TnQf_b7w

DN-DETR: Accelerate DETR Training by Introducing Query DeNoising

Paper: https://arxiv.org/abs/2203.01305
Code: https://github.com/FengLi-ust/DN-DETR
中文解读: https://mp.weixin.qq.com/s/xdMfZ_L628Ru1d1iaMny0w

Accelerating DETR Convergence via Semantic-Aligned Matching

Paper: https://arxiv.org/abs/2203.06883
Code: https://github.com/ZhangGongjie/SAM-DETR

Localization Distillation for Dense Object Detection

Paper: https://arxiv.org/abs/2102.12252
Code: https://github.com/HikariTJU/LD
Code2: https://github.com/HikariTJU/LD
中文解读：https://mp.weixin.qq.com/s/dxss8RjJH283h6IbPCT9vg

Focal and Global Knowledge Distillation for Detectors

Paper: https://arxiv.org/abs/2111.11837
Code: https://github.com/yzd-v/FGD
中文解读：https://mp.weixin.qq.com/s/yDkreTudC8JL2V2ETsADwQ

A Dual Weighting Label Assignment Scheme for Object Detection

Paper: https://arxiv.org/abs/2203.09730
Code: https://github.com/strongwolf/DW

AdaMixer: A Fast-Converging Query-Based Object Detector

Paper(Oral): https://arxiv.org/abs/2203.16507
Code: https://github.com/MCG-NJU/AdaMixer

Omni-DETR: Omni-Supervised Object Detection with Transformers

Paper: https://arxiv.org/abs/2203.16089
Code: https://github.com/amazon-research/omni-detr

目标跟踪(Visual Tracking)

Correlation-Aware Deep Tracking

Paper: https://arxiv.org/abs/2203.01666
Code: None

TCTrack: Temporal Contexts for Aerial Tracking

Paper: https://arxiv.org/abs/2203.01885
Code: https://github.com/vision4robotics/TCTrack

多目标跟踪(Multi-Object Tracking)

Learning of Global Objective for Network Flow in Multi-Object Tracking

Paper: https://arxiv.org/abs/2203.16210
Code: None

语义分割(Semantic Segmentation)

弱监督语义分割

Class Re-Activation Maps for Weakly-Supervised Semantic Segmentation

Paper: https://arxiv.org/abs/2203.00962
Code: https://github.com/zhaozhengChen/ReCAM

Multi-class Token Transformer for Weakly Supervised Semantic Segmentation

Paper: https://arxiv.org/abs/2203.02891
Code: https://github.com/xulianuwa/MCTformer

Learning Affinity from Attention: End-to-End Weakly-Supervised Semantic Segmentation with Transformers

Paper: https://arxiv.org/abs/2203.02664
Code: https://github.com/rulixiang/afa

半监督语义分割

ST++: Make Self-training Work Better for Semi-supervised Semantic Segmentation

Paper: https://arxiv.org/abs/2106.05095
Code: https://github.com/LiheYoung/ST-PlusPlus
中文解读：https://mp.weixin.qq.com/s/knSnlebdtEnmrkChGM_0CA

Semi-Supervised Semantic Segmentation Using Unreliable Pseudo-Labels

Homepage: https://haochen-wang409.github.io/U2PL/
Paper: https://arxiv.org/abs/2203.03884
Code: https://github.com/Haochen-Wang409/U2PL
中文解读: https://mp.weixin.qq.com/s/-08olqE7np8A1XQzt6HAgQ

无监督语义分割

GroupViT: Semantic Segmentation Emerges from Text Supervision

Homepage: https://jerryxu.net/GroupViT/
Paper: https://arxiv.org/abs/2202.11094
Demo: https://youtu.be/DtJsWIUTW-Y

实例分割(Instance Segmentation)

BoxeR: Box-Attention for 2D and 3D Transformers

Paper: https://arxiv.org/abs/2111.13087
Code: https://github.com/kienduynguyen/BoxeR
中文解读：https://mp.weixin.qq.com/s/UnUJJBwcAsRgz6TnQf_b7w

E2EC: An End-to-End Contour-based Method for High-Quality High-Speed Instance Segmentation

Paper: https://arxiv.org/abs/2203.04074
Code: https://github.com/zhang-tao-whu/e2ec

Mask Transfiner for High-Quality Instance Segmentation

Paper: https://arxiv.org/abs/2111.13673
Code: https://github.com/SysCV/transfiner

自监督实例分割

FreeSOLO: Learning to Segment Objects without Annotations

Paper: https://arxiv.org/abs/2202.12181
Code: None

视频实例分割

Efficient Video Instance Segmentation via Tracklet Query and Proposal

Homepage: https://jialianwu.com/projects/EfficientVIS.html
Paper: https://arxiv.org/abs/2203.01853
Demo: https://youtu.be/sSPMzgtMKCE

小样本分割(Few-Shot Segmentation)

Learning What Not to Segment: A New Perspective on Few-Shot Segmentation

Paper: https://arxiv.org/abs/2203.07615
Code: https://github.com/chunbolang/BAM

视频理解(Video Understanding)

Self-supervised Video Transformer

Homepage: https://kahnchana.github.io/svt/
Paper: https://arxiv.org/abs/2112.01514
Code: https://github.com/kahnchana/svt

行为识别(Action Recognition)

Spatio-temporal Relation Modeling for Few-shot Action Recognition

Paper: https://arxiv.org/abs/2112.05132
Code: https://github.com/Anirudh257/strm

动作检测(Action Detection)

End-to-End Semi-Supervised Learning for Video Action Detection

Paper: https://arxiv.org/abs/2203.04251
Code: None

图像编辑(Image Editing)

Style Transformer for Image Inversion and Editing

Paper: https://arxiv.org/abs/2203.07932
Code: https://github.com/sapphire497/style-transformer

Blended Diffusion for Text-driven Editing of Natural Images

Paper: https://arxiv.org/abs/2111.14818
Code: https://github.com/omriav/blended-diffusion

SemanticStyleGAN: Learning Compositional Generative Priors for Controllable Image Synthesis and Editing

Homepage: https://semanticstylegan.github.io/
Paper: https://arxiv.org/abs/2112.02236
Demo: https://semanticstylegan.github.io/videos/demo.mp4

Low-level Vision

ISNAS-DIP: Image-Specific Neural Architecture Search for Deep Image Prior

Paper: https://arxiv.org/abs/2111.15362
Code: None

Restormer: Efficient Transformer for High-Resolution Image Restoration

Paper: https://arxiv.org/abs/2111.09881
Code: https://github.com/swz30/Restormer

超分辨率(Super-Resolution)

图像超分辨率(Image Super-Resolution)

Learning the Degradation Distribution for Blind Image Super-Resolution

Paper: https://arxiv.org/abs/2203.04962
Code: https://github.com/greatlog/UnpairedSR

视频超分辨率(Video Super-Resolution)

BasicVSR++: Improving Video Super-Resolution with Enhanced Propagation and Alignment

Paper: https://arxiv.org/abs/2104.13371
Code: https://github.com/open-mmlab/mmediting
Code: https://github.com/ckkelvinchan/BasicVSR_PlusPlus
中文解读：https://mp.weixin.qq.com/s/HZTwYfphixyLHxlbCAxx4g

去模糊(Deblur)

图像去模糊(Image Deblur)

Learning to Deblur using Light Field Generated and Real Defocus Images

Homepage: http://lyruan.com/Projects/DRBNet/
Paper(Oral): https://arxiv.org/abs/2204.00442
Code: https://github.com/lingyanruan/DRBNet

3D点云(3D Point Cloud)

Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling

Homepage: https://point-bert.ivg-research.xyz/
Paper: https://arxiv.org/abs/2111.14819
Code: https://github.com/lulutang0608/Point-BERT

A Unified Query-based Paradigm for Point Cloud Understanding

Paper: https://arxiv.org/abs/2203.01252
Code: None

CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding

Paper: https://arxiv.org/abs/2203.00680
Code: https://github.com/MohamedAfham/CrossPoint

PointCLIP: Point Cloud Understanding by CLIP

Paper: https://arxiv.org/abs/2112.02413
Code: https://github.com/ZrrSkywalker/PointCLIP

3D目标检测(3D Object Detection)

BoxeR: Box-Attention for 2D and 3D Transformers

Paper: https://arxiv.org/abs/2111.13087
Code: https://github.com/kienduynguyen/BoxeR
中文解读：https://mp.weixin.qq.com/s/UnUJJBwcAsRgz6TnQf_b7w

Embracing Single Stride 3D Object Detector with Sparse Transformer

Paper: https://arxiv.org/abs/2112.06375
Code: https://github.com/TuSimple/SST

Canonical Voting: Towards Robust Oriented Bounding Box Detection in 3D Scenes

Paper: https://arxiv.org/abs/2011.12001
Code: https://github.com/qq456cvb/CanonicalVoting

MonoDTR: Monocular 3D Object Detection with Depth-Aware Transformer

Paper: https://arxiv.org/abs/2203.10981
Code: https://github.com/kuanchihhuang/MonoDTR

3D语义分割(3D Semantic Segmentation)

Scribble-Supervised LiDAR Semantic Segmentation

Paper: https://arxiv.org/abs/2203.08537
Dataset: https://github.com/ouenal/scribblekitti

3D目标跟踪(3D Object Tracking)

Beyond 3D Siamese Tracking: A Motion-Centric Paradigm for 3D Single Object Tracking in Point Clouds

Paper: https://arxiv.org/abs/2203.01730
Code: https://github.com/Ghostish/Open3DSOT

PTTR: Relational 3D Point Cloud Object Tracking with Transformer

Paper: https://arxiv.org/abs/2112.02857
Code: https://github.com/Jasonkks/PTTR

3D人体姿态估计(3D Human Pose Estimation)

MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation

Paper: https://arxiv.org/abs/2111.12707
Code: https://github.com/Vegetebird/MHFormer
中文解读: https://zhuanlan.zhihu.com/p/439459426

MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video

Paper: https://arxiv.org/abs/2203.00859
Code: None

Distribution-Aware Single-Stage Models for Multi-Person 3D Pose Estimation

Paper: https://arxiv.org/abs/2203.07697
Code: None
中文解读：https://mp.weixin.qq.com/s/L_F28IFLXvs5R4V9TTUpRw

3D语义场景补全(3D Semantic Scene Completion)

MonoScene: Monocular 3D Semantic Scene Completion

Paper: https://arxiv.org/abs/2112.00726
Code: https://github.com/cv-rits/MonoScene

3D重建(3D Reconstruction)

BANMo: Building Animatable 3D Neural Models from Many Casual Videos

Homepage: https://banmo-www.github.io/
Paper: https://arxiv.org/abs/2112.12761
Code: https://github.com/facebookresearch/banmo
中文解读：https://mp.weixin.qq.com/s/NMHP8-xWwrX40vpGx55Qew

伪装物体检测(Camouflaged Object Detection)

Zoom In and Out: A Mixed-scale Triplet Network for Camouflaged Object Detection

Paper: https://arxiv.org/abs/2203.02688
Code: https://github.com/lartpang/ZoomNet

深度估计(Depth Estimation)

单目深度估计

NeW CRFs: Neural Window Fully-connected CRFs for Monocular Depth Estimation

Paper: https://arxiv.org/abs/2203.01502
Code: None

OmniFusion: 360 Monocular Depth Estimation via Geometry-Aware Fusion

Paper: https://arxiv.org/abs/2203.00838
Code: None

Toward Practical Self-Supervised Monocular Indoor Depth Estimation

Paper: https://arxiv.org/abs/2112.02306
Code: None

P3Depth: Monocular Depth Estimation with a Piecewise Planarity Prior

Paper: https://arxiv.org/abs/2204.02091
Code: https://github.com/SysCV/P3Depth

立体匹配(Stereo Matching)

ACVNet: Attention Concatenation Volume for Accurate and Efficient Stereo Matching

Paper: https://arxiv.org/abs/2203.02146
Code: https://github.com/gangweiX/ACVNet

车道线检测(Lane Detection)

Rethinking Efficient Lane Detection via Curve Modeling

Paper: https://arxiv.org/abs/2203.02431
Code: https://github.com/voldemortX/pytorch-auto-drive
Demo：https://user-images.githubusercontent.com/32259501/148680744-a18793cd-f437-461f-8c3a-b909c9931709.mp4

图像修复(Image Inpainting)

Incremental Transformer Structure Enhanced Image Inpainting with Masking Positional Encoding

Paper: https://arxiv.org/abs/2203.00867
Code: https://github.com/DQiaole/ZITS_inpainting

图像检索(Image Retrieval)

Correlation Verification for Image Retrieval

Paper(Oral): https://arxiv.org/abs/2204.01458
Code: https://github.com/sungonce/CVNet

人脸识别(Face Recognition)

AdaFace: Quality Adaptive Margin for Face Recognition

Paper(Oral): https://arxiv.org/abs/2204.00964
Code: https://github.com/mk-minchul/AdaFace

人群计数(Crowd Counting)

Leveraging Self-Supervision for Cross-Domain Crowd Counting

Paper: https://arxiv.org/abs/2103.16291
Code: None

医学图像(Medical Image)

BoostMIS: Boosting Medical Image Semi-supervised Learning with Adaptive Pseudo Labeling and Informative Active Annotation

Paper: https://arxiv.org/abs/2203.02533
Code: None

场景图生成(Scene Graph Generation)

SGTR: End-to-end Scene Graph Generation with Transformer

Paper: https://arxiv.org/abs/2112.12970
Code: None

参考视频目标分割(Referring Video Object Segmentation)

Language as Queries for Referring Video Object Segmentation

Paper: https://arxiv.org/abs/2201.00487
Code: https://github.com/wjn922/ReferFormer

ReSTR: Convolution-free Referring Image Segmentation Using Transformers

Paper: https://arxiv.org/abs/2203.16768
Code: None

风格迁移(Style Transfer)

StyleMesh: Style Transfer for Indoor 3D Scene Reconstructions

Homepage: https://lukashoel.github.io/stylemesh/
Paper: https://arxiv.org/abs/2112.01530
Code: https://github.com/lukasHoel/stylemesh
Demo：https://www.youtube.com/watch?v=ZqgiTLcNcks

Adversarial Examples(对抗样本)

Shadows can be Dangerous: Stealthy and Effective Physical-world Adversarial Attack by Natural Phenomenon

Paper: https://arxiv.org/abs/2203.03818
Code: https://github.com/hncszyq/ShadowAttack

弱监督物体检测(Weakly Supervised Object Localization)

Weakly Supervised Object Localization as Domain Adaption

Paper: https://arxiv.org/abs/2203.01714
Code: https://github.com/zh460045050/DA-WSOL_CVPR2022

雷达目标检测(Radar Object Detection)

Exploiting Temporal Relations on Radar Perception for Autonomous Driving

Paper: https://arxiv.org/abs/2204.01184
Code: None

高光谱图像重建(Hyperspectral Image Reconstruction)

Mask-guided Spectral-wise Transformer for Efficient Hyperspectral Image Reconstruction

Paper: https://arxiv.org/abs/2111.07910
Code: https://github.com/caiyuanhao1998/MST

图像拼接(Image Stitching)

Deep Rectangling for Image Stitching: A Learning Baseline

Paper(Oral): https://arxiv.org/abs/2203.03831
Code: https://github.com/nie-lang/DeepRectangling
Dataset: https://github.com/nie-lang/DeepRectangling
中文解读：https://mp.weixin.qq.com/s/lp5AnrtO_9urp-Fv6Z0l2Q

水印(Watermarking)

Deep 3D-to-2D Watermarking: Embedding Messages in 3D Meshes and Extracting Them from 2D Renderings

Paper: https://arxiv.org/abs/2104.13450
Code: None

Grounded Situation Recognition

Collaborative Transformers for Grounded Situation Recognition

Paper: https://arxiv.org/abs/2203.16518
Code: https://github.com/jhcho99/CoFormer

Zero-shot Learning

Unseen Classes at a Later Time? No Problem

Paper: https://arxiv.org/abs/2203.16517
Code: https://github.com/sumitramalagi/Unseen-classes-at-a-later-time

数据集(Datasets)

It’s About Time: Analog Clock Reading in the Wild

Homepage: https://charigyang.github.io/abouttime/
Paper: https://arxiv.org/abs/2111.09162
Code: https://github.com/charigyang/itsabouttime
Demo: https://youtu.be/cbiMACA6dRc

Toward Practical Self-Supervised Monocular Indoor Depth Estimation

Paper: https://arxiv.org/abs/2112.02306
Code: None

Kubric: A scalable dataset generator

Paper: https://arxiv.org/abs/2203.03570
Code: https://github.com/google-research/kubric
中文解读：https://mp.weixin.qq.com/s/mJ8HzY6C0GifxsErJIS3Mg

Scribble-Supervised LiDAR Semantic Segmentation

Paper: https://arxiv.org/abs/2203.08537
Dataset: https://github.com/ouenal/scribblekitti

Deep Rectangling for Image Stitching: A Learning Baseline

Paper(Oral): https://arxiv.org/abs/2203.03831
Code: https://github.com/nie-lang/DeepRectangling
Dataset: https://github.com/nie-lang/DeepRectangling
中文解读：https://mp.weixin.qq.com/s/lp5AnrtO_9urp-Fv6Z0l2Q

ObjectFolder 2.0: A Multisensory Object Dataset for Sim2Real Transfer

Homepage: https://ai.stanford.edu/~rhgao/objectfolder2.0/
Paper: https://arxiv.org/abs/2204.02389
Dataset: https://github.com/rhgao/ObjectFolder
Demo：https://youtu.be/e5aToT3LkRA

新任务(New Task)

Language-based Video Editing via Multi-Modal Multi-Level Transformer

Paper: https://arxiv.org/abs/2104.01122
Code: None

It’s About Time: Analog Clock Reading in the Wild

Homepage: https://charigyang.github.io/abouttime/
Paper: https://arxiv.org/abs/2111.09162
Code: https://github.com/charigyang/itsabouttime
Demo: https://youtu.be/cbiMACA6dRc

Splicing ViT Features for Semantic Appearance Transfer

Homepage: https://splice-vit.github.io/
Paper: https://arxiv.org/abs/2201.00424
Code: https://github.com/omerbt/Splice

其他(Others)

Kubric: A scalable dataset generator

Paper: https://arxiv.org/abs/2203.03570
Code: https://github.com/google-research/kubric
中文解读：https://mp.weixin.qq.com/s/mJ8HzY6C0GifxsErJIS3Mg

X-Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense Captioning

Paper: https://arxiv.org/abs/2203.00843
Code: https://github.com/CurryYuan/X-Trans2Cap

Balanced MSE for Imbalanced Visual Regression

Paper(Oral): https://arxiv.org/abs/2203.16427
Code: https://github.com/jiawei-ren/BalancedMSE

参考
https://github.com/amusi/CVPR2022-Papers-with-Code#3D-Point-Cloud

CVPR 2022 论文

CVPR 2022 论文

文章目录

Backbone

CLIP

GAN

NAS

OCR

NeRF

Visual Transformer

Backbone

应用(Application)

视觉和语言(Vision-Language)

自监督学习(Self-supervised Learning)

数据增强(Data Augmentation)

目标检测(Object Detection)

目标跟踪(Visual Tracking)

多目标跟踪(Multi-Object Tracking)

语义分割(Semantic Segmentation)

弱监督语义分割

半监督语义分割

无监督语义分割

实例分割(Instance Segmentation)

自监督实例分割

视频实例分割

小样本分割(Few-Shot Segmentation)

视频理解(Video Understanding)

行为识别(Action Recognition)

动作检测(Action Detection)

图像编辑(Image Editing)

Low-level Vision

超分辨率(Super-Resolution)

图像超分辨率(Image Super-Resolution)

视频超分辨率(Video Super-Resolution)

去模糊(Deblur)

图像去模糊(Image Deblur)

3D点云(3D Point Cloud)

3D目标检测(3D Object Detection)

3D语义分割(3D Semantic Segmentation)

3D目标跟踪(3D Object Tracking)

3D人体姿态估计(3D Human Pose Estimation)

3D语义场景补全(3D Semantic Scene Completion)

3D重建(3D Reconstruction)

伪装物体检测(Camouflaged Object Detection)

深度估计(Depth Estimation)

单目深度估计

立体匹配(Stereo Matching)

车道线检测(Lane Detection)

图像修复(Image Inpainting)

图像检索(Image Retrieval)

人脸识别(Face Recognition)

人群计数(Crowd Counting)

医学图像(Medical Image)

场景图生成(Scene Graph Generation)

参考视频目标分割(Referring Video Object Segmentation)

风格迁移(Style Transfer)

Adversarial Examples(对抗样本)

弱监督物体检测(Weakly Supervised Object Localization)

雷达目标检测(Radar Object Detection)

高光谱图像重建(Hyperspectral Image Reconstruction)

图像拼接(Image Stitching)

水印(Watermarking)

Grounded Situation Recognition

Zero-shot Learning

数据集(Datasets)

新任务(New Task)

其他(Others)

猜你喜欢