Click @CVComputer Vision to follow more CV information
The paper has been packaged, click to enter -> download interface
Click to join—>CV computer vision exchange group
1. [Basic network architecture: Transformer] Multi-entity Video Transformers for Fine-Grained Video Representation Learning
-
Paper address: https://arxiv.org//pdf/2311.10873
-
Open source code: GitHub - facebookresearch/video_rep_learning: SSL Video Representation Learning project
2.【Anomaly Detection】NNG-Mix: Improving Semi-supervised Anomaly Detection with Pseudo-anomaly Generation
-
Paper address: https://arxiv.org//pdf/2311.11961
-
Open source code (soon to be open source): GitHub - donghao51/NNG-Mix: Improving Semi-supervised Anomaly Detection with Pseudo-anomaly Generation
3.【Semantic Segmentation】Generalized Category Discovery in Semantic Segmentation
-
Paper address: https://arxiv.org//pdf/2311.11525
-
Open source code (soon to be open source): GitHub - JethroPeng/GCDSS: The official code implementation of Generalized Category Discovery in Semantic Segmentation
4.【3D Target Detection】Sparse4D v3: Advancing End-to-End 3D Detection and Tracking
-
Paper address: https://arxiv.org//pdf/2311.11722
-
Open source code: GitHub - linxuewu/Sparse4D: Sparse4D v1 & v2
5.【点云】Point Cloud Self-supervised Learning via 3D to Multi-view Masked Autoencoder
-
Paper address: https://arxiv.org//pdf/2311.10887
-
Open source code (soon to be open source): GitHub - Zhimin-C/Multiview-MAE
6.【Point Cloud 3D Object Detection】Domain Generalization of 3D Object Detection by Density-Resampling
-
Paper address: https://arxiv.org//pdf/2311.10845
-
The code will be open source soon
7. [Medical Image Segmentation] SA-Med2D-20M Dataset: Segment Anything in 2D Medical Imaging with 20 Million masks
-
Paper address: https://arxiv.org//pdf/2311.11969
-
Open source code: GitHub - OpenGVLab/SAM-Med2D: Official implementation of SAM-Med2D
8.【Multi-modal】VLM-Eval: A General Evaluation on Video Large Language Models
-
Paper address: https://arxiv.org//pdf/2311.11865
-
The code will be open source soon
9.【多模态】LION : Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge
-
Paper address: https://arxiv.org//pdf/2311.11860
-
Open source code (soon to be open source): GitHub - rshaojimmy/JiuTian: JiuTian, a Multimodal Large Language Model from HITSZ
10.【多模态】CORE-MM: Complex Open-Ended Reasoning Evaluation For Multi-Modal Large Language Models
-
Paper address: https://arxiv.org//pdf/2311.11567
-
工程主页:CORE-MM: Complex Open-ended Reasoning Evaluation for Multi-modal LargeLanguage Models
-
Open source code (soon to be open source): GitHub - core-mm/core-mm
11.【Multimodal】GPT-4V(ision) for Robotics: Multimodal Task Planning from Human Demonstration
-
Paper address: https://arxiv.org//pdf/2311.12015
-
Project homepage: https://microsoft.github.io/GPT4Vision-Robot-Manipulation-Prompts/
-
The code will be open source soon
12.【Digital Human】Semantic-Preserved Point-based Human Avatar
-
Paper address: https://arxiv.org//pdf/2311.11614
-
Open source code (soon to be open source): GitHub - l1346792580123/spa
13.【Autonomous Driving】A Language Agent for Autonomous Driving
-
Paper address: https://arxiv.org//pdf/2311.10813
-
Open source code: GitHub - USC-GVL/Agent-Driver: A Language Agent for Autonomous Driving
14.【Diffusion】Reti-Diff: Illumination Degradation Image Restoration with Retinex-based Latent Diffusion Model
-
Paper address: https://arxiv.org//pdf/2311.11638
-
Open source code (soon to be open source): GitHub - ChunmingHe/Reti-Diff
15.【Human Pose Estimation】Multiple View Geometry Transformers for 3D Human Pose Estimation
-
Paper address: https://arxiv.org//pdf/2311.10983
-
Open source code (soon to be open source): GitHub - XunshanMan/MVGFormer
16.【Crowd Counting】Evaluating Supervision Levels Trade-Offs for Infrared-Based People Counting
-
Paper address: https://arxiv.org//pdf/2311.11974
-
Open source code (soon to be open source): tortueTortue/IRPeopleCounting · GitHub
17.【Image Restoration】Deep Equilibrium Diffusion Restoration with Parallel Sampling
-
Paper address: https://arxiv.org//pdf/2311.11600
-
Open source code (soon to be open source): GitHub - caojiezhang/DeqIR: PyTorch implementation of "Deep Equilibrium Diffusion Restoration with Parallel Sampling"
18.【NeRF】Entangled View-Epipolar Information Aggregation for Generalizable Neural Radiance Fields
-
Paper address: https://arxiv.org//pdf/2311.11845
-
Open source code (coming soon): GitHub - tatakai1/EVENeRF
19.【3D Reconstruction】LiDAR-HMR: 3D Human Mesh Recovery from LiDAR
-
Paper address: https://arxiv.org//pdf/2311.11971
-
Open source code (soon to be open source): GitHub - soullessrobot/LiDAR-HMR: Code and data for LiDAR-HMR: 3D Human Mesh Recovery from LiDAR
The paper has been packaged , download link
CV computer vision communication group
The group includes target detection, image segmentation, target tracking, Transformer, multi-modality, NeRF, GAN, defect detection, salient target detection, key point detection, super-resolution reconstruction, SLAM, face, OCR, biomedical images, 3D reconstruction, attitude estimation, autonomous driving perception, depth estimation, video understanding, behavior recognition, image dehazing, image deraining, image restoration, image retrieval, lane line detection, point cloud target detection, point cloud segmentation, image compression, motion Leaders in prediction, neural network quantification, network deployment and other fields share technical knowledge, interview skills and internally recommended recruitment information from time to time .
Students who want to join the group please add WeChat ID to contact the administrator: PingShanHai666 . When adding friends, please note: school/company + research direction + nickname .
Recommended reading:
CV computer vision daily open source code Paper with code quick overview-2023.11.20
CV computer vision daily open source code Paper with code quick overview-2023.11.17
CV computer vision daily open source code Paper with code quick overview-2023.11.16