Click @CVComputer Vision to follow more CV information
The paper has been packaged, click to enter -> download interface
Click to join—>CV computer vision exchange group
1. [Basic network architecture: Transformer] White-Box Transformers via Sparse Rate Reduction: Compression Is All There Is?
-
Paper address: https://arxiv.org//pdf/2311.13110
-
Engineering Home Page: White-Box Transformers via Sparse Rate Reduction
-
Open source code: https://github.com/Ma-Lab-Berkeley/CRATE
2.【Rotating Object Detection】Innovative Horizons in Aerial Imagery: LSKNet Meets DiffusionDet for Advanced Object Detection
-
Paper address: https://arxiv.org//pdf/2311.12956
-
Open source code: https://github.com/SashaMatsun/LSKDiffDet
3.【Image Segmentation】Visual In-Context Prompting
-
Paper address: https://arxiv.org//pdf/2311.13601
-
Open source code (soon to be open source): https://github.com/UX-Decoder/DINOv
4.【Medical Image Segmentation】SegVol: Universal and Interactive Volumetric Medical Image Segmentation
-
Paper address: https://arxiv.org//pdf/2311.13385
-
Open source code: https://github.com/BAAI-DCAI/SegVol
5.【Domain Adaptive】DA-STC: Domain Adaptive Video Semantic Segmentation via Spatio-Temporal Consistency
-
Paper address: https://arxiv.org//pdf/2311.13254
-
Open source code: https://github.com/ZHE-SAPI/DA-STC
6.【多模态】Soulstyler: Using Large Language Model to Guide Image Style Transfer for Target Object
-
Paper address: https://arxiv.org//pdf/2311.13562
-
Open source code (soon to be open source): https://github.com/yisuanwang/Soulstyler
7.【Multi-modal】PG-Video-LLaVA: Pixel Grounding Large Video-Language Models
-
Paper address: https://arxiv.org//pdf/2311.13435
-
Open source code (soon to be open source): https://github.com/mbzuai-oryx/Video-LLaVA
8.【多模态】FusionFrames: Efficient Architectural Aspects for Text-to-Video Generation Pipeline
-
Paper address: https://arxiv.org//pdf/2311.13073
-
Open source code: https://github.com/ai-forever/KandinskyVideo
9.【Multimodal】LiveChat: Video Comment Generation from Audio-Visual Multimodal Contexts
-
Paper address: https://arxiv.org//pdf/2311.12826
-
Open source code: https://github.com/yy1lab/LiveChat
10. [Number People] XAGen: 3D Expressive Human Avatars Generation
-
Paper address: https://arxiv.org//pdf/2311.13574
-
Project home page: XAGen - Project Page
-
Open source code (soon to be open source): https://github.com/magic-research/xagen
11.【Depth Estimation】Camera-Independent Single Image Depth Estimation from Defocus Blur
-
Paper address: https://arxiv.org//pdf/2311.13045
-
Open source code: https://github.com/sleekEagle/defocus_camind
12.【Diffusion】DiffusionMat: Alpha Matting as Sequential Refinement Learning
-
Paper address: https://arxiv.org//pdf/2311.13535
-
Engineering homepage: DiffusionMat
-
Open source code (soon to be open source): https://github.com/cnnlstm/DiffusionMat
13.【Target Counting】T-Rex: Counting by Visual Prompting
-
Paper address: https://arxiv.org//pdf/2311.13596
-
Engineering Home Page: T-Rex Counting
-
Open source code (soon to be open source): https://github.com/IDEA-Research/T-Rex
14.【NeRF】PIE-NeRF: Physics-based Interactive Elastodynamics with NeRF
-
Paper address: https://arxiv.org//pdf/2311.13099
-
Engineering Home Page: PIE-NeRF
-
The code will be open source soon
15.【图像合成】Diffusion360: Seamless 360 Degree Panoramic Image Generation based on Diffusion Models
-
Paper address: https://arxiv.org//pdf/2311.13141
-
Open source code: https://github.com/ArcherFMY/SD-T2I-360PanoImage
The paper has been packaged , download link
CV computer vision communication group
The group includes target detection, image segmentation, target tracking, Transformer, multi-modality, NeRF, GAN, defect detection, salient target detection, key point detection, super-resolution reconstruction, SLAM, face, OCR, biomedical images, 3D reconstruction, attitude estimation, autonomous driving perception, depth estimation, video understanding, behavior recognition, image dehazing, image deraining, image restoration, image retrieval, lane line detection, point cloud target detection, point cloud segmentation, image compression, motion Leaders in prediction, neural network quantification, network deployment and other fields share technical knowledge, interview skills and internally recommended recruitment information from time to time .
Students who want to join the group please add WeChat ID to contact the administrator: PingShanHai666 . When adding friends, please note: school/company + research direction + nickname .
Recommended reading:
CV computer vision daily open source code Paper with code quick overview-2023.11.22
CV computer vision daily open source code Paper with code quick overview-2023.11.21
CV computer vision daily open source code Paper with code quick overview-2023.11.20