CV计算机视觉每日开源代码Paper with code速览-2023.11.28

点击@CV计算机视觉，关注更多CV干货

1.【图像分割】Stable Segment Anything Model

论文地址：https://arxiv.org//pdf/2311.15776
开源代码（即将开源）：https://github.com/fanq15/Stable-SAM

2.【目标跟踪】Single-Model and Any-Modality for Video Object Tracking

论文地址：https://arxiv.org//pdf/2311.15851
开源代码（即将开源）：https://github.com/Zongwei97/UnTrack

3.【视频超分辨率重建】Enhancing Perceptual Quality in Video Super-Resolution through Temporally-Consistent Detail Synthesis using Diffusion Models

论文地址：https://arxiv.org//pdf/2311.15908
开源代码（即将开源）：https://github.com/claudiom4sir/StableVSR

4.【多模态】Video-Bench: A Comprehensive Benchmark and Toolkit for Evaluating Video-based Large Language Models

论文地址：https://arxiv.org//pdf/2311.16103
开源代码：https://github.com/PKU-YuanGroup/Video-Bench

5.【多模态】ViT-Lens-2: Gateway to Omni-modal Intelligence

论文地址：https://arxiv.org//pdf/2311.16081
开源代码：https://github.com/TencentARC/ViT-Lens

6.【多模态】GaussianEditor: Editing 3D Gaussians Delicately with Text Instructions

论文地址：https://arxiv.org//pdf/2311.16037
工程主页：GaussianEditor: Editing 3D Gaussians Delicately with Text Instructions
代码即将开源

7.【多模态】EVCap: Retrieval-Augmented Image Captioning with External Visual-Name Memory for Open-World Comprehension

论文地址：https://arxiv.org//pdf/2311.15879
工程主页：EVCap: Retrieval-Augmented Image Captioning with External Visual-Name Memory for Open-World Comprehension
代码即将开源

8.【多模态】FlowZero: Zero-Shot Text-to-Video Synthesis with LLM-Driven Dynamic Scene Syntax

论文地址：https://arxiv.org//pdf/2311.15813
工程主页：FlowZero: Zero-Shot Text-to-Video Synthesis with LLM-Driven Dynamic Scene Syntax
开源代码（即将开源）：https://github.com/aniki-ly/FlowZero

9.【多模态】GPT4Vis: What Can GPT-4 Do for Zero-shot Visual Recognition?

论文地址：https://arxiv.org//pdf/2311.15732
开源代码：https://github.com/whwu95/GPT4Vis

10.【多模态】Breathing Life Into Sketches Using Text-to-Video Priors

论文地址：https://arxiv.org//pdf/2311.13608
工程主页：Breathing Life Into Sketches Using Text-to-Video Priors
开源代码（即将开源）：https://github.com/yael-vinker/live_sketch

11.【数字人】Animatable Gaussians: Learning Pose-dependent Gaussian Maps for High-fidelity Human Avatar Modeling

论文地址：https://arxiv.org//pdf/2311.16096
工程主页：Projectpage of Animatable Gaussians
开源代码（即将开源）：https://github.com/lizhe00/AnimatableGaussians

12.【自动驾驶：Occupancy Prediction】OccWorld: Learning a 3D Occupancy World Model for Autonomous Driving

论文地址：https://arxiv.org//pdf/2311.16038
开源代码：https://github.com/wzzheng/OccWorld

13.【视频理解】Side4Video: Spatial-Temporal Side Network for Memory-Efficient Image-to-Video Transfer Learning

论文地址：https://arxiv.org//pdf/2311.15769
开源代码（即将开源）：https://github.com/HJYao00/Side4Video

14.【视频理解】Vamos: Versatile Action Models for Video Understanding

论文地址：https://arxiv.org//pdf/2311.13627
工程主页：Vamos: Versatile Action Models for Video Understanding
代码即将开源

15.【行人重识别】Multi-scale Semantic Correlation Mining for Visible-Infrared Person Re-Identification

论文地址：https://arxiv.org//pdf/2311.14395
开源代码：https://github.com/Hua-XC/MSCMNet

16.【Diffusion】Continual Learning of Diffusion Models with Generative Distillation

论文地址：https://arxiv.org//pdf/2311.14028
开源代码：https://github.com/Atenrev/difussion_continual_learning

17.【知识蒸馏】Knowledge From the Dark Side: Entropy-Reweighted Knowledge Distillation for Balanced Knowledge Transfer

论文地址：https://arxiv.org//pdf/2311.13621
开源代码：https://github.com/cpsu00/ER-KD

18.【Continual Learning】Density Distribution-based Learning Framework for Addressing Online Continual Learning Challenges

论文地址：https://arxiv.org//pdf/2311.13623
代码即将开源

论文已打包，下载链接

CV计算机视觉交流群

群内包含目标检测、图像分割、目标跟踪、Transformer、多模态、NeRF、GAN、缺陷检测、显著目标检测、关键点检测、超分辨率重建、SLAM、人脸、OCR、生物医学图像、三维重建、姿态估计、自动驾驶感知、深度估计、视频理解、行为识别、图像去雾、图像去雨、图像修复、图像检索、车道线检测、点云目标检测、点云分割、图像压缩、运动预测、神经网络量化、网络部署等多个领域的大佬，不定期分享技术知识、面试技巧和内推招聘信息。

想进群的同学请添加微信号联系管理员：PingShanHai666。添加好友时请备注：学校/公司+研究方向+昵称。

CV计算机视觉每日开源代码Paper with code速览-2023.11.23

CV计算机视觉每日开源代码Paper with code速览-2023.11.22

CV计算机视觉每日开源代码Paper with code速览-2023.11.21

CV计算机视觉每日开源代码Paper with code速览-2023.11.20