Click @CVComputer Vision to follow more CV information
The paper has been packaged, click to enter -> download interface
Click to join—>CV computer vision exchange group
1.【语义分割】Mobile-Seed: Joint Semantic Segmentation and Boundary Detection for Mobile Robots
-
Paper address: https://arxiv.org//pdf/2311.12651
-
工程主页:Mobile-Seed: Joint Semantic Segmentation and Boundary Detection for Mobile Robots
-
Open source code (soon to be open source): GitHub - WHU-USI3DV/Mobile-Seed: [Arxiv'23] Mobile-Seed: Joint Semantic Segmentation and Boundary Detection for Mobile Robots
2.【Medical Image Segmentation】Semi-supervised Medical Image Segmentation via Query Distribution Consistency
-
Paper address: https://arxiv.org//pdf/2311.12364
-
Open source code (soon to be open source): https://github.com/Rows21/DK-UXNet
3.【Super-resolution reconstruction】Swift Parameter-free Attention Network for Efficient Super-Resolution
-
Paper address: https://arxiv.org//pdf/2311.12770
-
Open source code: GitHub - hongyuanyu/SPAN: Swift Parameter-free Attention Network for Efficient Super-Resolution
4.【域自适应】(WACV2024)GLAD: Global-Local View Alignment and Background Debiasing for Unsupervised Video Domain Adaptation with Large Domain Gap
-
Paper address: https://arxiv.org//pdf/2311.12467
-
Open source code: GitHub - KHU-VLL/GLAD
5.【Multi-Modal】ShareGPT4V: Improving Large Multi-Modal Models with Better Captions
-
Paper address: https://arxiv.org//pdf/2311.12793
-
Project homepage: ShareGPT4V
-
Open source code (soon to be open source): https://github.com/InternLM/InternLM-XComposer/tree/main/projects/ShareGPT4V
6.【多模态】GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning
-
Paper address: https://arxiv.org//pdf/2311.12631
-
工程主页:GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning
-
Open source code (soon to be open source): GitHub - jiaxilv/GPT4Motion
7.【多模态】From Wrong To Right: A Recursive Approach Towards Vision-Language Explanation
-
Paper address: https://arxiv.org//pdf/2311.12391
-
Open source code (soon to be open source): GitHub - para-lost/ReVisE: init
8.【多模态】ViLaM: A Vision-Language Model with Enhanced Visual Grounding and Generalization Capability
-
Paper address: https://arxiv.org//pdf/2311.12327
-
Open source code (coming soon): GitHub - AnonymGiant/ViLaM
9.【多模态】Boosting Audio-visual Zero-shot Learning with Large Language Models
-
Paper address: https://arxiv.org//pdf/2311.12268
-
Open source code (soon to be open source): GitHub - chenhaoxing/KDA: This repository is the code of paper 'Boosting Audio-visual Zero-shot Learning with Large Language Models'.
10.【Multi-modal】Enhancing Novel Object Detection via Cooperative Foundational Models
-
Paper address: https://arxiv.org//pdf/2311.12068
-
Open source code (soon to be open source): GitHub - rohit901/cooperative-foundational-models: Official code for our paper "Enhancing Novel Object Detection via Cooperative Foundational Models"
11.【自动驾驶:Occupancy Prediction】SelfOcc: Self-Supervised Vision-Based 3D Occupancy Prediction
-
Paper address: https://arxiv.org//pdf/2311.12754
-
Open source code (soon to be open source): GitHub - huang-yh/SelfOcc: SelfOcc: Self-Supervised Vision-Based 3D Occupancy Prediction
12.【Diffusion】Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models
-
Paper address: https://arxiv.org//pdf/2311.12092
-
Open source code: GitHub - rohitgandikota/sliders: Concept Sliders for Precise Control of Diffusion Models
13.【Object Count】Point, Segment and Count: A Generalized Framework for Object Counting
-
Paper address: https://arxiv.org//pdf/2311.12386
-
Open source code (soon to be open source): GitHub - Hzzone/PseCo
14.【视频生成】MagicDance: Realistic Human Dance Video Generation with Motions & Facial Expressions Transfer
-
Paper address: https://arxiv.org//pdf/2311.12052
-
Project homepage: MagicDance: Realistic Human Dance Video Generationwith Motions & Facial Expressions Transfer.
-
Open source code (soon to be open source): GitHub - Boese0601/MagicDance: MagicDance: Realistic Human Dance Video Generation with Motions & Facial Expressions Transfer
15.【3D Reconstruction】TouchSDF: A DeepSDF Approach for 3D Shape Reconstruction using Vision-Based Tactile Sensing
-
Paper address: https://arxiv.org//pdf/2311.12602
-
Engineering Home Page: TouchSDF
-
Open source code: GitHub - maurock/TouchSDF: Implementation of the DeepSDF paper
The paper has been packaged , download link
CV computer vision communication group
The group includes target detection, image segmentation, target tracking, Transformer, multi-modality, NeRF, GAN, defect detection, salient target detection, key point detection, super-resolution reconstruction, SLAM, face, OCR, biomedical images, 3D reconstruction, attitude estimation, autonomous driving perception, depth estimation, video understanding, behavior recognition, image dehazing, image deraining, image restoration, image retrieval, lane line detection, point cloud target detection, point cloud segmentation, image compression, motion Leaders in prediction, neural network quantification, network deployment and other fields share technical knowledge, interview skills and internally recommended recruitment information from time to time .
Students who want to join the group please add WeChat ID to contact the administrator: PingShanHai666 . When adding friends, please note: school/company + research direction + nickname .
Recommended reading:
CV computer vision daily open source code Paper with code quick overview-2023.11.21
CV computer vision daily open source code Paper with code quick overview-2023.11.20
CV computer vision daily open source code Paper with code quick overview-2023.11.17
CV computer vision daily open source code Paper with code quick overview-2023.11.16