多模态论文汇总

综述

  1. Multimodal Machine Learning: A Survey and Taxonomy
    论文网址:https://arxiv.org/pdf/1705.09406.pdf
    中文翻译:Multimodal Machine Learning:A Survey and Taxonomy(多模态综述)
  2. Multimodal Learning with Transformers: A Survey
    论文网址:https://arxiv.org/pdf/2206.06488.pdf
    中文翻译:​300+篇文献!一文详解基于Transformer的多模态学习最新进展(内容不全,建议看原文)
  3. 开放型对话技术研究综述
    总结:开放型对话系统研究综述
  4. 任务型对话系统中的自然语言生成研究进展综述

tutorial

  1. Vision-Language Pretraining: Current Trends and the Future
    网址:https://vlp-tutorial-acl2022.github.io/

模型

  1. Transformer
    论文网址:Attention Is All You Need
  2. BERT
    论文网址:BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
    李沐视频讲解:BERT 论文逐段精读【论文精读】
  3. ViLT
    论文网址:ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision
    源码网址:https://github.com/dandelin/vilt
    bryanyzhu视频讲解:ViLT 论文精读【论文精读】
    个人笔记:【论文&模型讲解】ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision
  4. VL-BEiT
    论文网址:VL-BEiT: Generative Vision-Language Pretraining
    相关论文:BEiT: BERT Pre-Training of Image Transformers
  5. CLIP
    论文网址:Learning Transferable Visual Models From Natural Language Supervision
    源码网址:https://github.com/OpenAI/CLIP
    bryanyzhu视频讲解:CLIP 论文逐段精读【论文精读】
    个人笔记:【论文&模型讲解】CLIP(Learning Transferable Visual Models From Natural Language Supervision)
  6. VideoBERT
    论文网址:VideoBERT: A Joint Model for Video and Language Representation Learning
    源码网址:https://github.com/ammesatyajit/VideoBERT
    个人笔记:【论文&模型讲解】VideoBERT: A Joint Model for Video and Language Representation Learning
  7. Two-Stream Convolutional Networks for Action Recognition in Videos
    论文网址:https://arxiv.org/abs/1406.2199
    个人笔记:【论文&模型讲解】Two-Stream Convolutional Networks for Action Recognition in Videos

猜你喜欢

转载自blog.csdn.net/Friedrichor/article/details/126939715