CV计算机视觉每日开源代码Paper with code速览-2023.12.7

点击@计算机视觉，关注更多CV干货

论文已打包，点击进入—>下载界面

点击加入—>CV计算机视觉交流群

1.【基础网络架构：Transformer】Split & Merge: Unlocking the Potential of Visual Adapters via Sparse Training

论文地址：https://arxiv.org//pdf/2312.02923
开源代码：https://github.com/Theia-4869/MoSA

2.【基础网络架构：Transformer】（NeurIPS2023）Are Vision Transformers More Data Hungry Than Newborn Visual Systems?

论文地址：https://arxiv.org//pdf/2312.02843
开源代码：https://github.com/buildingamind/ViT-CoT

3.【基础网络架构：Transformer】Class-Discriminative Attention Maps for Vision Transformers

论文地址：https://arxiv.org//pdf/2312.02364
开源代码：https://github.com/lenbrocki/CDAM

4.【语义分割】SAM-Assisted Remote Sensing Imagery Semantic Segmentation with Object and Boundary Constraints

论文地址：https://arxiv.org//pdf/2312.02464
开源代码：https://github.com/sstary/SSRS

5.【点云3D目标检测】Diffusion-SS3D: Diffusion Model for Semi-supervised 3D Object Detection

论文地址：https://arxiv.org//pdf/2312.02966
开源代码：https://github.com/luluho1208/Diffusion-SS3D

6.【多模态】GPT4Point: A Unified Framework for Point-Language Understanding and Generation

论文地址：https://arxiv.org//pdf/2312.02980
工程主页：GPT4Point
代码即将开源

7.【多模态】BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal Models

论文地址：https://arxiv.org//pdf/2312.02896
开源代码：https://github.com/AIFEG/BenchLMM

8.【多模态】Machine Vision Therapy: Multimodal Large Language Models Can Enhance Visual Robustness via Denoising In-Context Learning

论文地址：https://arxiv.org//pdf/2312.02546
开源代码：https://github.com/tmllab/Machine_Vision_Therapy

9.【多模态】Lenna: Language Enhanced Reasoning Detection Assistant

论文地址：https://arxiv.org//pdf/2312.02433
开源代码（即将开源）：https://github.com/Meituan-AutoML/Lenna

10.【多模态】CLIPDrawX: Primitive-based Explanations for Text Guided Sketch Synthesis

论文地址：https://arxiv.org//pdf/2312.02345
工程主页：CLIPDrawX: Primitive-based Explanations for Text Guided Sketch Synthesis
代码即将开源

11.【多模态】A Contrastive Compositional Benchmark for Text-to-Image Synthesis: A Study with Unified Text-to-Image Fidelity Metrics

论文地址：https://arxiv.org//pdf/2312.02338
开源代码：https://github.com/zhuxiangru/Winoground-T2I

12.【多模态】PixelLM: Pixel Reasoning with Large Multimodal Model

论文地址：https://arxiv.org//pdf/2312.02228
工程主页：PixelLM:Pixel Reasoning with Large Multimodal Model
开源代码（即将开源）：https://github.com/MaverickRen/PixelLM

13.【多模态】Behind the Magic, MERLIM: Multi-modal Evaluation Benchmark for Large Image-Language Models

论文地址：https://arxiv.org//pdf/2312.02219
开源代码（即将开源）：https://github.com/ojedaf/MERLIM

14.【数字人】FlashAvatar: High-Fidelity Digital Avatar Rendering at 300FPS

论文地址：https://arxiv.org//pdf/2312.02214
工程主页：FlashAvatar
代码即将开源

15.【自监督学习】Local Masking Meets Progressive Freezing: Crafting Efficient Vision Transformers for Self-Supervised Learning

论文地址：https://arxiv.org//pdf/2312.02194
开源代码：https://github.com/utkutpcgl/ViTFreeze

16.【数据增强】GeNIe: Generative Hard Negative Images Through Diffusion

论文地址：https://arxiv.org//pdf/2312.02548
开源代码（即将开源）：https://github.com/UCDvision/GeNIe

17.【深度估计】PatchFusion: An End-to-End Tile-Based Framework for High-Resolution Monocular Metric Depth Estimation

论文地址：https://arxiv.org//pdf/2312.02284
工程主页：PatchFusion: An End-to-End Tile-Based Framework for High-Resolution Monocular Metric Depth Estimation
开源代码：https://github.com/zhyever/PatchFusion

18.【自动驾驶】WoVoGen: World Volume-aware Diffusion for Controllable Multi-camera Driving Scene Generation

论文地址：https://arxiv.org//pdf/2312.02934

19.【Diffusion】X-Adapter: Adding Universal Compatibility of Plugins for Upgraded Diffusion Model

论文地址：https://arxiv.org//pdf/2312.02238
工程主页：X-Adapter
开源代码（即将开源）：https://github.com/showlab/X-Adapter

20.【Diffusion】Portrait Diffusion: Training-free Face Stylization with Chain-of-Painting

论文地址：https://arxiv.org//pdf/2312.02212
开源代码：https://github.com/liujin112/PortraitDiffusion

21.【视频编辑】Drag-A-Video: Non-rigid Video Editing with Point-based Interaction

论文地址：https://arxiv.org//pdf/2312.02936
工程主页：Drag-A-Video
代码即将开源

22.【视频编辑】SAVE: Protagonist Diversification with Structure Agnostic Video Editing

论文地址：https://arxiv.org//pdf/2312.02503
工程主页：SAVE: Protagonist Diversification with Structure Agnostic Video Editing
开源代码（即将开源）：https://github.com/ldynx/SAVE

23.【视频编辑】DragVideo: Interactive Drag-style Video Editing

论文地址：https://arxiv.org//pdf/2312.02216
开源代码（即将开源）：https://github.com/RickySkywalker/DragVideo-Official

24.【人体运动生成】Generating Fine-Grained Human Motions Using ChatGPT-Refined Descriptions

论文地址：https://arxiv.org//pdf/2312.02772
工程主页：Generating Fine-Grained Human Motions Using ChatGPT-Generated Descriptions
代码即将开源

25.【人体运动生成】EMDM: Efficient Motion Diffusion Model for Fast, High-Quality Motion Generation

论文地址：https://arxiv.org//pdf/2312.02256
工程主页：EMDM: Efficient Motion Diffusion Model for Fast, High-Quality Human Motion Generation
代码即将开源

26.【NeRF】PointNeRF++: A multi-scale, point-based Neural Radiance Field

论文地址：https://arxiv.org//pdf/2312.02362
工程主页：PointNeRF++: A multi-scale, point-based Neural Radiance Field.
代码即将开源

27.【NeRF】WavePlanes: A compact Wavelet representation for Dynamic Neural Radiance Fields

论文地址：https://arxiv.org//pdf/2312.02218
开源代码：https://github.com/azzarelli/waveplanes/

28.【人体重建】HumanNeRF-SE: A Simple yet Effective Approach to Animate HumanNeRF with Diverse Poses

论文地址：https://arxiv.org//pdf/2312.02232
工程主页：HumanNeRF-SE
开源代码（即将开源）：https://github.com/Miles629/HumanNeRF-SE

29.【三维重建】ImageDream: Image-Prompt Multi-view Diffusion for 3D Generation

论文地址：https://arxiv.org//pdf/2312.02201
工程主页：ImageDream: Image-Prompt Multi-view Diffusion for 3D Generation
开源代码（即将开源）：https://github.com/ImageDream-byte/ImageDream

论文已打包，下载链接

CV计算机视觉交流群

群内包含目标检测、图像分割、目标跟踪、Transformer、多模态、NeRF、GAN、缺陷检测、显著目标检测、关键点检测、超分辨率重建、SLAM、人脸、OCR、生物医学图像、三维重建、姿态估计、自动驾驶感知、深度估计、视频理解、行为识别、图像去雾、图像去雨、图像修复、图像检索、车道线检测、点云目标检测、点云分割、图像压缩、运动预测、神经网络量化、网络部署等多个领域的大佬，不定期分享技术知识、面试技巧和内推招聘信息。

想进群的同学请添加微信号联系管理员：PingShanHai666。添加好友时请备注：学校/公司+研究方向+昵称。

CV计算机视觉每日开源代码Paper with code速览-2023.12.5

CV计算机视觉每日开源代码Paper with code速览-2023.12.4

CV计算机视觉每日开源代码Paper with code速览-2023.12.1